3.3. Results

The goal of Results object is to take care of the job folder after the execution is finished: gather information about produced files, help to manage them and extract data of interest from them. Every Job instance has an associated Results instance created automatically on job creation and stored in its results attribute.

From the technical standpoint, Results class is the part of PLAMS environment responsible for thread safety and proper synchronization in parallel job execution.

3.3.1. Files in the job folder

Directly after the execution of a job is finished (see Running a job), the job folder gets scanned by collect() method. All files present in the job folder, including files in subfolders, are gathered in a list stored in files attribute of the Results instance. Entries in this list correspond to paths to files relative to the job folder, so files on the top level are stored by their names and files in subfolders by something like childjob/childjob.out.

Note

Files produced by Pickling are excluded from the files list. Every file with .dill extension is simply ignored by Results.

If you need an absolute path to some file, the bracket notation known from dictionaries is defined for Results objects. When supplied with an entry from files list, it returns the absolute path to that file. The bracket notation is read-only:

>>> r = j.run()
>>> print(r.files)
['plamsjob.err', 'plamsjob.in', 'plamsjob.out', 'plamsjob.run']
>>> print(r['plamsjob.out'])
/home/user/plams.12345/plamsjob/plamsjob.in
>>> r['newfile.txt'] = '/home/user/abc.txt'
TypeError: 'Results' object does not support item assignment

In the bracket notation, and in every other context regarding Results, whenever you need to pass a string with a filename, a shortcut $JN can be used for the job name:

>>> r.rename('$JN.out', 'outputfile')
>>> r.grep_file('$JN.err', 'NORMAL TERMINATION')
>>> print(r['$JN.run'])
/home/user/plams.12345/plamsjob/plamsjob.run

Some produce produce fixed name files during execution (like for example ADF’s TAPE21). If one wants to automatically rename those files it can be done with _rename_map class attribute – a dictionary defining which files should be renamed and how:

>>> print(ADFResults._rename_map)
{'TAPE13': '$JN.t13', 'TAPE21': '$JN.t21'}

Renaming is done during collect(). In the example above, if a file named TAPE21 is found in the job folder, it is renamed to [jobname].t21. If it’s not there, nothing happens (no error or exception is raised).

In the generic Results class _rename_map is an empty dictionary.

3.3.2. Synchronization of parallel job executions

One of the main advantages of PLAMS is the ability to run jobs in parallel. The whole job execution logic is designed in a way that does not require a special parallel script for a parallel workflow execution. Exactly the same scripts can be used for both serial and parallel execution.

However, it is important to have a basic understanding of how parallelism in PLAMS works to avoid potential deadlocks and maximize the performance of your scripts.

To run your job in parallel you need to use a parallel job runner:

pjr = JobRunner(parallel=True)
myresults = myjob.run(jobrunner=pjr)

Parallelism is not something that is “on” or “off” for the entire script: within one script you can use multiple job runners, some of them may be parallel and some may be serial. However, if you wish to always use the same JobRunner instance, it is convenient to set is as default at the beginning of your script:

config.default_jobrunner = JobRunner(parallel=True)

All run() calls without explicit jobrunner argument will now use that instance.

When you run a job using a serial job runner, all steps of run() (see Running a job) are done in the main thread and Results instance is returned at the end. On the other hand, when a parallel job runner is used, a new thread is spawned at the beginning of run() and all further work is done in this thread. Meanwhile the main thread proceeds with the next part of the script. The important thing is that the run() method called in the main thread returns a Results instance and allows the whole script to proceed even though the job is still running in a separate thread. This Results instance acts as a “guardian” protecting the job from being accessed while it is still running. Every time you call a method of a Results instance, the guardian checks the status of the job and, if the job is not yet finished, forces the thread from which the call was done to wait. Thanks to that there is no need to explicitly put synchronization points in the script – results requests serve for that purpose.

Warning

You should NEVER access results in any other way than by a method of some Results instance.

The Results class is designed in such a way, that each of its methods automatically gets wrapped with the access guardian when a Results instance is created. That behavior holds for any Results subclasses and new methods defined by user, so no need to worry about guardian when extending Results functionality or subclassing it. Also Binding decorators recognize when you try to use them with Results and act accordingly. Methods whose names end with two underscores, as well as refresh(), collect(), _clean() are not wrapped with the guardian. The guardian gives special privileges (earlier access) to postrun() and check() (see Prerun and postrun methods).

If you never request any results of your job and just want to run it, finish() method works as a global synchronization point. It waits for all spawned threads to end before cleaning the environment and exiting your script.

3.3.2.1. Examples

This section provides a handful of examples together with an explanation of common pitfalls and good practices one should keep in mind when writing parallel PLAMS scripts.

Let us start with a simple parallel script that takes all .xyz files in a given folder and for each one calculates the dipole moment magnitude using a single point ADF calculation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
config.default_jobrunner = JobRunner(parallel=True)
config.log.stdout = 1
folder = '/home/user/xyz'
molecules = read_molecules(folder)

s = Settings()
s.input.basis.type = 'DZP'
s.input.geometry.SP = True
s.input.xc.gga = 'PBE'

jobs = [ADFJob(molecule=molecules[name], name=name, settings=s) for name in sorted(molecules)]
results = [job.run() for job in jobs]

for r in results:
    dipole_vec = r.readkf('Properties', 'Dipole')
    dipole_magn = sum([a*a for a in dipole_vec])**0.5
    print('{}\t\t{}'.format(r.job.name, dipole_magn))

For an explanation purpose let us assume that /home/user/xyz contains three files: ammonia.xyz, ethanol.xyz, water.xyz. When you run this script the standard output will look something like:

[17:31:52] PLAMS working folder: /home/user/plams_workdir
[17:31:52] JOB 'ammonia' STARTED
[17:31:52] JOB 'ethanol' STARTED
[17:31:52] JOB 'water' STARTED
[17:31:52] Waiting for job ammonia to finish
[17:31:56] JOB 'water' SUCCESSFUL
[17:31:56] JOB 'ammonia' SUCCESSFUL
ammonia     0.5949493793257181
[17:31:56] Waiting for job ethanol to finish
[17:32:01] JOB 'ethanol' SUCCESSFUL
ethanol     0.5946259677193089
water       0.7082267171673067

As you can see, print statements from line 17 are mixed with automatic logging messages. Let us examine in more detail what causes such a behavior. To do so we will follow the main thread. In line 11 an alphabetically sorted list of jobs is created, so the job named 'ethanol' will come after 'ammonia' and before 'water'. Line 12 is a for loop that goes along the list of jobs, runs each of them and collects their Results instances in a new list called results. If we were using a serial job runner, all the computational work would happen in line 12: the 'ethanol' job would start only when 'ammonia' was finished, 'water' would wait for 'ethanol' and the main thread would proceed to the next line only when 'water' is done.

In our case, however, we are using a parallel job runner. The first job ('ammonia') is started and quickly moves to a separate thread, allowing the main thread to proceed to another instruction, which in this case is the run() method of the 'ethanol' job. Thanks to that all three jobs are started almost immediately one after another, corresponding Results are gathered and the main thread proceeds to line 14, while the three jobs are running “in the background”, handled by separate threads. Now the main thread goes along the results list (which follows the same order as jobs) and tries to obtain a dipole vector from each job. It uses readkf method of Results instance associated with the 'ammonia' job and since this job is still running, the main thread hangs and waits for the job to finish (“Waiting for job ammonia to finish”). Meanwhile we can see that the 'water' job ends and this fact is logged. Quickly after that also the 'ammonia' job finishes and the main thread obtains dipole_vec, calculates dipole_magn and prints it. Now the for loop in line 14 continues, this time for the 'ethanol' job. This job seems to be a bit longer than 'ammonia', so it is still running and the main thread again hangs on the readkf method (“Waiting for job ethanol to finish”). After finally obtaining the dipole vector of ethanol, calculating the magnitude and printing it, the for loop goes on with its last iteration, the 'water' job. This time there is no need to wait since the job is already finished - the result is calculated and printed immediately.

Knowing that, let us wonder what would happen if the order of jobs was different. If 'ethanol' was the first job on the list, by the time its results would be obtained and printed, both other jobs would have finished, so no further waiting would be needed. On the other hand, if the order was 'water''ammonia''ethanol', the main thread would have to wait every time when executing line 15.

The most important lesson from the above is: the order in which you start jobs does not matter (too much), it is the order of results requests that makes the difference. Of course in our very simple example it influences only the way in which results are mixed with log messages, but in more complicated workflows it can directly affect the total runtime of your script.

By the way, to avoid print statements being mixed with logging messages one could first store the data and print it only when all the results are ready:

to_print = []
for r in results:
    dipole_vec = r.readkf('Properties', 'Dipole')
    dipole_magn = sum([a*a for a in dipole_vec])**0.5
    to_print += [(r.job.name, dipole_magn)]
for nam, dip in to_print:
    print('{}\t\t{}'.format(nam, dip))

Another way could be disabling logging to the standard output by putting config.log.stdout = 0 at the beginning of the script (see log()).

Coming back to the main topic of our considerations, as we have seen above, parallelism in PLAMS is driven by results request. Not only the order of requests is important, but also (probably even more important) the place from which they are made. To picture this matter we will use the following script that performs geometry optimization followed by frequencies calculation of the optimized geometry:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
config.default_jobrunner = JobRunner(parallel=True)

go = ADFJob(name='GeomOpt', molecule=Molecule('geom.xyz'))
go.settings.input.geometry.go = True
... #other settings adjustments for geometry optimisation
go_results = go.run()

opt_geo = go_results.get_main_molecule()

freq = ADFJob(name='Freq', molecule=opt_geo)
freq.settings.input.geometry.frequencies = True
... #other settings adjustments for frequency run
freq_results = freq.run()

do_other_work() # further part of the script, independent of GeomOpt and Freq

Again let us follow the main thread. In line 8 we can see a results request for the optimized geometry from “GeomOpt” job. The main thread will wait for that job to finish before preparing the “Freq” job and running it. That means do_other_work(), whatever it is, will not start before “GeomOpt” is done, even though it could, since it is independent of GeomOpt and Freq results. This is bad. The main thread is wasting time that could be used for do_other_work() on idle waiting. We need to fix the script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
config.default_jobrunner = JobRunner(parallel=True)

go = ADFJob(name='GeomOpt', molecule=Molecule('geom.xyz'))
go.settings.input.geometry.go = True
... #other settings adjustments for geometry optimisation
go_results = go.run()

freq = ADFJob(name='Freq')
freq.settings.input.geometry.frequencies = True
... #other settings adjustments for frequency run

@add_to_instance(freq)
def prerun(self):
    self.molecule = go_results.get_main_molecule()

freq_results = freq.run()

do_other_work() # further part of the script, independent of GeomOpt and Freq

The results request (go_results.get_main_molecule()) have been moved from the main script to the prerun() method of the “Freq” job. The prerun() method is executed in job’s thread rather than in the main thread. That means the main thread starts the “Freq” job immediately after starting the “GeomOpt” job and then directly proceeds to do_other_work(). Meanwhile in the thread spawned for the “Freq” job the result request for molecule is made and only that thread waits for “GeomOpt” to finish.

As seen in the above example, it is extremely important to properly configure jobs that are dependent (setup of one depends on results of another). Resolving all such dependencies in job’s thread rather than in the main thread guarantees that waiting for results is done only by the code that really needs them.

Note

In some cases dependencies between job are not easily expressed via methods of Results (for example, one job sets up some environment that is later used by another job). In such cases one can use job’s depend attribute to explicitly tell the job about other jobs it has to wait for. Adding job2 to job1.depend is roughly equivalent to putting job2.results.wait() in job1 prerun().

To sum up all the above considerations, here is the rule of thumb on how to write properly working parallel PLAMS scripts:

  1. Request results as late as possible, preferably just before using them.
  2. If possible, avoid requesting results in the main thread.
  3. Place the result request in the thread in which that data is later used.

3.3.3. Cleaning job folder

The Results instance associated with a job is responsible for cleaning the job folder (removing files that are no longer needed). Cleaning is done automatically, twice for each job.

First cleaning is done during run(), just after check() and before postrun(). The value adjusting this first cleaning is taken from myjob.settings.keep and should be either a string or a list (see below).

This cleaning is intended for situations when your jobs produce large files that you don’t need for further processing. Running many of such jobs could deplete the disk space and cause the whole script to crash. If you wish to immediately get rid of some files produced by your jobs (without having a chance to do anything with them), use the first cleaning.

In the majority of cases it is sufficient to use the second cleaning, which is performed at the end of your script, when finish() method is called. It is adjusted by myjob.settings.save. You can use the second cleaning to remove files that you no longer need after you extracted relevant data earlier in your script.

The argument passed to _clean() (in other words the value that is supposed to be kept in myjob.settings.keep and myjob.settings.save) can be one of the following:

  • 'all' – nothing is removed, cleaning is skipped.
  • 'none' or [] or None – everything is removed from the job folder.
  • list of strings – list of filenames to be kept. Shortcut $JN can be used here, as well as *-wildcards. For example ['geo.*', '$JN.out', 'logfile'] will keep [jobname].out, logfile and all files whose names start with geo. and remove everything else from the job folder.
  • list of strings with the first element '-' – reversed behavior to the above, listed files will be removed. For example ['-', 't21.*', '$JN.err'] will remove [jobname].err and all files whose names start with t21.

3.3.3.1. Cleaning for multijobs

Cleaning happens for every job run with PLAMS, either single job or multijob. That means, for example, that a single job that is a child of some multijob will have its job folder cleaned by two different Results instances: it’s own Results and its parent’s Results. Those two cleanings can interfere with each other. Hence it is a good practice to set cleaning only on one level (either in a parent job or in children jobs) and disable cleaning on the other level, by using 'all'.

Another shortcut can be used for cleaning in multijobs: $CH is expanded with every possible child name. For example, if you have a multijob mj with 5 single job children (child1, child2 and so on) and you wish to keep only input and output files of children jobs you can set:

mj.settings.save = ['$CH/$CH.in', '$CH/$CH.out']

It is equivalent to:

mj.settings.save = ['child1/child1.in', 'child2/child2.in', ... , 'child1/child1.out', 'child2/child2.out', ...]

As you can see above, while cleaning a multijob folder you have to keep in mind that files in subfolders are kept as relative paths.

3.3.4. API

class Results(job)[source]

General concrete class for job results.

job attribute stores a reference to associated job. files attribute is a list with contents of the job folder. _rename_map is a class attribute with the dictionary storing the default renaming scheme.

Bracket notation (myresults[filename]) can be used to obtain full absolute paths to files in the job folder.

Instance methods are automatically wrapped with the “access guardian” that ensures thread safety (see Synchronization of parallel job executions).

refresh()[source]

Refresh the contents of the files list. Traverse the job folder (and all its subfolders) and collect relative paths to all files found there, except files with .dill extension.

This is a cheap and fast method that should be used every time there is a risk the contents of the job folder changed and files is no longer up-to-date. For proper working of various PLAMS elements it is crucial that files always contains up-to-date information about the contents of the job folder.

All functions and methods defined in PLAMS that could change the state of the job folder refresh the files list, so there is no need to manually call refresh() after, for example, rename(). If you are implementing a new method of that kind, please don’t forget about refreshing.

collect()[source]

Collect the files present in the job folder after execution of the job is finished. This method is simply refresh() followed by renaming according to the _rename_map.

If you wish to override this function, you have to call the parent version at the beginning.

wait()[source]

Wait for associated job to finish.

Technical

This is not an abstract method. It does exactly what it should: nothing. All the work is done by _restrict() decorator that is wrapped around it.

grep_file(filename, pattern='', options='')[source]

Execute grep on a file given by filename and search for pattern.

Additional grep flags can be passed with options, which should be a single string containing all flags, space separated.

Returned value is a list of lines (strings). See man grep for details.

grep_output(pattern='', options='')[source]

Shortcut for grep_file() on the output file.

awk_file(filename, script='', progfile=None, **kwargs)[source]

Execute an AWK script on a file given by filename.

The AWK script can be supplied in two ways: either by directly passing the contents of the script (should be a single string) as the script argument, or by providing the path (absolute or relative to filename) to a text file with an AWK script as the progfile argument. If progfile is not None, script is ignored.

Other keyword arguments (**kwargs) can be used to pass additional variables to AWK (see -v flag in AWK manual)

Returned value is a list of lines (strings). See man awk for details.

awk_output(script='', progfile=None, **kwargs)[source]

Shortcut for awk_file() on the output file.

rename(old, new)[source]

Rename a file from files. In both old and new the shortcut $JN for job name can be used.

get_file_chunk(filename, begin=None, end=None, match=0, inc_begin=False, inc_end=False, process=None)[source]

Extract a chunk of a text file given by filename, consisting of all the lines between a line containing begin and a line containing end.

begin and end should be simple strings (no regular expressions allowed) or None (in that case matching is done from the beginning or until the end of the file). If multiple blocks delimited by begin end end are present in the file, match can be used to indicate which one should be printed (match*=0 prints all of them). *inc_begin and inc_end can be used to include the delimiting lines in the final result (by default they are excluded).

The returned value is a list of strings. process can be used to provide a function executed on each element of this list before returning it.

get_output_chunk(begin=None, end=None, match=0, inc_begin=False, inc_end=False, process=None)[source]

Shortcut for get_file_chunk() on the output file.

recreate_molecule()[source]

Recreate the input molecule for the corresponding job based on files present in the job folder. This method is used by load_external().

The definiton here serves as a deafult fall-back template preventing load_external() from crashing when a particular Results subclass does not define it’s own recreate_molecule().

recreate_settings()[source]

Recreate the input Settings instance for the corresponding job based on files present in the job folder. This method is used by load_external().

The definiton here serves as a deafult fall-back template preventing load_external() from crashing when a particular Results subclass does not define it’s own recreate_settings().

_clean(arg)[source]

Clean the job folder. arg should be a string or a list of strings. See Cleaning job folder for details.

_copy_to(newresults)[source]

Copy these results to newresults.

This method is used when Rerun prevention discovers an attempt to run a job identical to a previously run job. Instead of the execution, results of the previous job are copied/linked to the new one.

This method is called from Results of the old job and newresults should be Results of the new job. The goal is to faithfully recreate the state of this Results instance in newresults. To achieve that, all the contents of the job folder are copied (or hardlinked, if your platform allows that and job.settings.link_files is True) to other’s job folder. Moreover, all attributes of this Results instance (other than job and files) are exported to newresults using _export_attribute() method.

_export_attribute(attr, other)[source]

Export this instance’s attribute to other. This method should be overridden in your Results subclass if it has some attributes that are not properly handled by copy.deepcopy().

other is the Results instance, attr is the value of the attribute to be copied. See SCMJob._export_attribute for an example implementation.

static _replace_job_name(string, oldname, newname)[source]

If string starts with oldname, maybe followed by some extension, replace oldname with newname.

__getitem__(name)[source]

Magic method to enable bracket notation. Elements from files can be used to get absolute paths.

_process_file(filename, command)[source]

Skeleton for all file processing methods. Execute command (should be a list of strings) on filename and return output as a list of lines.

Technical

Other parts of results module described below are responsible for giving Results class its unique behavior described in Synchronization of parallel job executions. They are presented here for the sake of completeness, from a user’s perspective this information is not too relevant.

class _MetaResults[source]

Metaclass for Results. During new Results instance creation it wraps all methods with _restrict() decorator ensuring proper synchronization and thread safety. Methods listed in _dont_restrict as well as “magic methods” are not wrapped.

_restrict(func)[source]

Decorator that wraps methods of Results instances.

Whenever decorated method is called, the status of associated job is checked. Depending of its value access to the method is granted, refused or the calling thread is forced to wait for the right event to be set.

_caller_name_and_arg(frame)[source]

Extract information about name and arguments of a function call from a frame object

_privileged_access()[source]

Analyze contents of the current stack to find out if privileged access to the Results methods should be granted.

Privileged access is granted to two Job methods: postrun() and check(), but only if they are called from _finalize() of the same Job instance.