3.2. Jobs¶
Without any doubt job is the most important object in PLAMS library. Job is the basic piece of computational work and running jobs is the main goal of PLAMS scripts.
Various jobs may differ in details quite a lot, but they all follow the common set of rules defined in the abstract class Job
.
Note
Being an abstract class means that Job
class has some abstract methods – methods that are declared but not implemented (they do nothing). Those methods are supposed to be defined in subclasses of Job
. When a subclass of an abstract class defines all required abstract methods, it is called a concrete class. You should never create an instance of an abstract class, because when you try to use it, empty abstract methods are called and your script crashes.
Every job has its own unique name and a separate folder (called job folder, with the same name as the job) located in the main working folder. All files regarding that particular job (input, output, runscript, other files produced by job execution) end up in the job folder.
In general a job can be of one of two types: a single job or a multijob. These types are defined as subclasses of the Job
class: SingleJob
and MultiJob
.
Single job is a job representing a single calculation, usually done by executing an external binary (ADF, Dirac etc.). Single job creates a runscript that is then either executed locally or submitted to some external queueing system. As a result of running a single job a handful of files is created, including dumps of the standard output and standard error streams together with any other files produced by the external binary. SingleJob
is still an abstract class that is further subclassed by program-specific concrete classes like for example ADFJob
.
Multijob, on the other hand, does not run any calculation by itself. It is a container for other jobs, used to aggregate smaller jobs into bigger ones. There is no runscript produced by a multijob. Instead, it contains a list of subjobs called children that are run together when the parent job is executed. Children jobs can in turn be either single or multijobs. Job folder of each child job is a subfolder of its parent’s job folder, so folder hierarchy fully corresponds to job child/parent hierarchy. MultiJob
is a concrete class so you can create its instances and run them.
3.2.1. Preparing a job¶
The first step to run a job using PLAMS is to create a job object. You need to pick a concrete class that defines a type of job you want to run (ADFJob
will be used as an example in our case) and create its instance:
>>> myjob = ADFJob(name='myfirstjob')
Various keyword arguments (arguments of the form arg=value
, like name
in the example above) can be passed to a job constructor, depending on the type of your job. However, the following keyword arguments are common for all types of jobs:
name
– a string containing the name of the job. If not supplied, default nameplamsjob
is used. Job’s name cannot contain path separator (\
in Linux,/
in Windows).settings
– aSettings
instance to be used by this job. It gets copied (usingcopy()
) so you can pass the same instance to several different jobs and changes made afterwards won’t interfere. Any instance ofJob
can be also passed as a value of this argument. In that caseSettings
associated with the passed job are copied.depend
– a list of jobs that need to be finished before this job can start. This is useful when you want to execute your jobs in parallel. Usually there is no need to use this argument, since dependencies between jobs are resolved automatically (see Synchronization of parallel job executions). However, sometimes one needs to explicitly state such a dependency and this option is then helpful.
Those values do not need to be passed to the constructor, they can be set or changed later (but they should be fixed before the job starts to run):
>>> myjob = ADFJob()
>>> myjob.name = 'myfirstjob'
>>> myjob.settings.runscript.pre = 'echo HelloWorld'
Single jobs can be supplied with another keyword argument, molecule
. It is supposed to be a Molecule
object. Multijobs, in turn, accept keyword argument children
that stores the list of children jobs.
The most meaningful part of each job object is its Settings
instance. It is used to store information about contents of job’s input file, runscript as well as other tweaks of job’s behavior. Thanks to tree-like structure of Settings
, this information is organized in a convenient way: the top level (myjob.settings.
) stores general settings, myjob.settings.input.
is a branch for specifying input settings, myjob.settings.runscript.
holds information for runscript creation and so on. Some types of jobs will make use of their own myjob.settings
branches and not every kind of job will always require input
or runscript
branches (like multijob for example). The nice thing is that all the unnecessary data present in job’s settings is simply ignored, so accidentally plugging settings with too much data will not cause any problem (except some cases where the whole content of some branch is used, like for example the input
branch in SCMJob
).
3.2.1.1. Contents of job’s settings¶
The following keys and branches of job’s settings are meaningful for all kinds of jobs:
myjob.settings.input.
is a branch storing settings regarding input file of a job. The way data present in this branch is used depends on the type of job and is specified in respective subclasses ofJob
.myjob.settings.runscript.
holds runscript information, either program-specific or general:myjob.settings.runscript.shebang
– the first line of the runscript, starting with#!
, describing interpreter to usemyjob.settings.runscript.pre
– an arbitrary string that will be placed in the runscript file just below the shebang line, before the actual contentsmyjob.settings.runscript.post
– an arbitrary string to put at the end of the runscript.myjob.settings.runscript.stdout_redirect
– boolean flag defining if standard output redirection should be handled inside the runscript. If set toFalse
, the redirection will be done by Python outside the runscript. If set toTrue
, standard output will be redirected inside the runscript using>
.
myjob.settings.run.
branch stores run flags for the job (see below)myjob.settings.pickle
is a boolean defining if job object should be pickled after finishingmyjob.settings.keep
andmyjob.settings.save
are keys adjusting Cleaning job folder.myjob.settings.link_files
decides if files from job folder can be linked rather than copied when copying is requested
3.2.1.2. Default settings¶
Every job instance has an attribute called default_settings
that stores a list of Settings
instances that serve as default templates for that job. Initially this list contains only one element, global defaults for all jobs stored in config.job
. You can add other templates just like adding elements to a list:
>>> myjob.default_settings.append(sometemplate)
>>> myjob.default_settings += [temp1, temp2]
During job execution (just after prerun()
is finished) job’s own settings
are soft-updated with all elements of default_settings
list, one by one, starting with last. That way if you want to adjust some setting for all jobs run in your script you don’t need to go to each job and set it there every time, one change in config.job
is enough. Similarly, if you have a group of jobs that need the same settings
adjustments, you can create an empty Settings
instance, put those adjustments in it and add it to each job’s default_settings
. Keep in mind that soft_update()
is used so any key in a template in default_settings
will end up in job’s settings
only if such a key is not yet present there. Thanks to that the order of templates in default_settings
somehow defines their importance: data from a preceding template will never override the following one, it can only enrich it.
3.2.2. Running a job¶
After creating a job instance and adjusting its settings you can finally run it. It is done by invoking job’s run()
method, which returns a Results
instance:
>>> myresults = myjob.run()
Again, various keyword arguments can be passed here. With jobrunner
and jobmanager
you can specify which JobRunner
and JobManager
to use for your job. If those arguments are omitted, the default instances stored in config.default_jobrunner
and config.jm
are taken. All other keyword arguments passed here are collected and stored in myjob.settings.run
branch as one flat level. They can be used later by various objects involved in running your job, for example GridRunner
uses them to build command executed to submit runscript to the queueing system.
The following steps are taken after the run()
method is called:
myjob.settings.run
is soft-updated withrun()
keyword arguments.- If a parallel
JobRunner
was used, a new thread is spawned and all further steps of this list happen in this thread. - Explicit dependencies from
myjob.depend
are resolved. This means waiting for all jobs listed there to finish. - Job’s name gets registered in the job manager and the job folder is created.
- Job’s
prerun()
method is called. myjob.settings
are updated according to contents ofmyjob.default_settings
.- The hash of a job is calculated and checked (see Rerun prevention). If the same job was found as previously run, its results are copied (or linked) to the current job’s folder and
run()
finishes. - Now the real job execution happens. If your job is a single job, an input file and a runscript are produced and passed to job runner’s method
call()
. In case of multijob,run()
method is called for all children jobs. - After the execution is finished, result files produced by the job are collected and
check()
is used to test if the execution was successful. - The job folder is cleaned using
myjob.settings.keep
. See Cleaning job folder for details. - Job’s
postrun()
method is called. - If
myjob.settings.pickle
is true, the whole job instance gets pickled and saved to the[jobname].dill
file in the job folder.
3.2.2.1. Name conflicts¶
Jobs are identified by their names and hence those names need to be unique. This is obligatory also because job’s name corresponds to the name of its folder. Usually it is recommended to manually set unique names for jobs for easier navigation through results. But for part of applications, especially those requiring running large numbers of similar jobs, this is neither convenient nor necessary.
PLAMS automatically resolves conflicts between jobs’ names. During step 4. of the above list, if a job with the same name was already registered, the new job is renamed. The new name is created by appending some number to the old one. For example, the second job with the name plamsjob
will be renamed to plamsjob.002
, third to plamsjob.003
and so on. Number of digits used in this counter can be adjusted in config.jobmanager.counter_len
and the default value is 3. Overflowing the counter will not cause any problems, the job coming after plamsjob.999
will be called plamsjob.1000
.
3.2.2.2. Prerun and postrun methods¶
prerun()
and postrun()
methods are intended for further customization of your jobs. They can contain arbitrary pieces of code that are executed before and after the actual execution of your job. prerun()
takes place after job’s folder is created but before hash checking. Here are some ideas what can be put there:
- adjusting job
settings
- copying to job folder some files required for running
- extracting results of some other job, processing them and plugging to job
- generating children jobs in multijobs
See also Synchronization of parallel job executions for explanation how to use prerun()
to automatically handle dependencies in parallel workflows.
The other method, postrun()
, is called after job execution is finished, the results are collected and the job folder is cleaned. It is supposed to contain any kind of essential results postprocessing that needs to be done before results of this job can be pushed further in the workflow. For that purpose code contained in postrun()
has some special privileges. At the time the method is executed the job is not yet considered done, so all threads requesting its results are waiting. However, the guardian restricting the access to results of unfinished jobs can recognize code coming from postrun()
and allow it to access and modify results. So calling Results
methods can be safely done there and you can be sure that everything you put in postrun()
is done before other jobs have access to this job’s results.
prerun()
and postrun()
methods can be added to your jobs in multiple ways:
you can create a tiny subclass which redefines the method:
>>> class MyJobWithPrerun(MyJob): >>> def prerun(self): >>> #do stuff
It can be done right inside you script. After the above definition you can create instances of the new class and treat them in exactly the same way you would treat
MyJob
instances. The only difference is that they will be equipped withprerun()
method you just defined.you can bind the method to an existing class using
add_to_class()
decorator:>>> @add_to_class(MyJob) >>> def prerun(self): >>> #do stuff
That change affects all instances of
MyJob
, even those created before the above code was executed (obviously it won’t affect instances previously run and finished).you can bind the method directly to an instance using
add_to_instance()
decorator:>>> j = MyJob(...) >>> @add_to_instance(j) >>> def prerun(self): >>> #do stuff
Only one specified instance (
j
) is affected this way.
All the above works for postrun()
as well.
3.2.2.3. Preview mode¶
Preview mode is a special way of running jobs without the actual runscript execution. In this mode the procedure of running a job is interrupted just after input and runscript files are written to job folder. Preview mode can be used to check if your jobs generate proper input and runscript files, without having to run the full calculation.
You can enable preview mode by putting the following line at the beginning of your script:
>>> config.preview = True
3.2.3. Job API¶
-
class
Job
(name='plamsjob', settings=None, depend=None)[source]¶ General abstract class for all kind of computational tasks.
Methods common for all kinds of jobs are gathered here. Instances of
Job
should never be created. It should not be subclassed either. If you wish to define a new type of job please subclass eitherSingleJob
orMultiJob
.Methods that are meant to be explicitly called by the user are
run()
and occasionallypickle()
. In most cases Pickling is done automatically, but if for some reason you wish to do it manually, you can usepickle()
method.Methods that can be safely overridden in subclasses are:
check()
hash()
(see Rerun prevention)prerun()
andpostrun()
(see Prerun and postrun methods)
Other methods should remain unchanged.
Class attribute
_result_type
defines the type of results associated with this job. It should point to a class and it must be aResults
subclass.Every job instance has the following attributes. Values of these attributes are adjusted automatically and should not be set by the user:
status
– current status of the job in human-readable format.results
– reference to a results instance. An empty instance of the type stored in_result_type
is created when the job constructor is called.path
– an absolute path to the job folder.jobmanager
– a job manager associated with this job.parent
– a pointer to the parent job if this job is a child job of someMultiJob
.None
otherwise.
These attributes can be modified, but only before
run()
is called:name
– the name of the job.settings
– settings of the job.default_settings
– see Default settings.depend
– a list of explicit dependencies._dont_pickle
– additional list of this instance’s attributes that will be removed before pickling. See Pickling for details.
-
__getstate__
()[source]¶ Prepare an instance for pickling.
Attributes
jobmanager
,parent
,default_settings
and_lock
are removed, as well as all attributes listed inself._dont_pickle
.
-
run
(jobrunner=None, jobmanager=None, **kwargs)[source]¶ Run the job using jobmanager and jobrunner (or defaults, if
None
). Other keyword arguments (**kwargs) are stored inrun
branch of job’s settings. Returned value is theResults
instance associated with this job.Note
This method should not be overridden.
Technical
This method does not do too much by itself. After simple initial preparation it passes control to job runner, which decides if a new thread should be started for this job. The role of the job runner is to execute three methods that make the full job life cycle:
_prepare()
,_execute()
and_finalize()
. During_execute()
the job runner is called once again to execute the runscript (only in case ofSingleJob
).
-
pickle
(filename=None)[source]¶ Pickle this instance and save to a file indicated by filename. If
None
, save to[jobname].dill
in the job folder.
-
check
()[source]¶ Check if the calculation was successful.
This method can be overridden in concrete subclasses for different types of jobs. It should return a boolean value.
The definition here serves as a default, to prevent crashing if a subclass does not define its own
check()
. It always returnsTrue
.
-
prerun
()[source]¶ Actions to take before the actual job execution.
This method is initially empty, it can be defined in subclasses or directly added to either whole class or a single instance using Binding decorators.
-
postrun
()[source]¶ Actions to take just after the actual job execution.
This method is initially empty, it can be defined in subclasses or directly added to either whole class or a single instance using Binding decorators.
-
_prepare
(jobmanager)[source]¶ Prepare the job for execution. This method collects steps 1-7 from Running a job. Should not be overridden. Returned value indicates if job execution should continue (Rerun prevention did not find this job previously run).
-
_get_ready
()[source]¶ Get ready for
_execute()
. This is the last step before_execute()
is called. Abstract method.
-
_finalize
()[source]¶ Gather the results of job execution and organize them. This method collects steps 9-12 from Running a job. Should not be overridden.
3.2.4. Single jobs¶
-
class
SingleJob
(molecule=None, name='plamsjob', settings=None, depend=None)[source]¶ Abstract class representing a job consisting of a single execution of some external binary (or arbitrary shell script in general).
In addition to constructor arguments and attributes defined by
Job
, the constructor of this class accepts the keyword argumentmolecule
that should be aMolecule
instance. The constructor creates a copy of the suppliedMolecule
and stores it as themolecule
attribute.Class attribute
_filenames
defines default names for input, output, runscript and error files. If you wish to override this attribute it should be a dictionary with string keys'inp'
,'out'
,'run'
,'err'
. The value for each key should be a string describing corresponding file’s name. Shortcut$JN
can be used for job’s name. The default value is defined in the following way:>>> _filenames = {'inp':'$JN.in', 'run':'$JN.run', 'out':'$JN.out', 'err': '$JN.err'}
This class defines no new methods that could be directly called in your script. Methods that can and should be overridden are
get_input()
andget_runscript()
.-
_filename
(t)[source]¶ Return filename for file of type t. t can be any key from
_filenames
dictionary.$JN
is replaced with job name in returned string.
-
get_input
()[source]¶ Generate the input file. Abstract method.
This method should return a single string with full content of the input file. It should process information stored in
input
branch of job’s settings and inmolecule
attribute.
-
get_runscript
()[source]¶ Generate runscript. Abstract method.
This method should return a single string with runscript contents. It can process information stored in
runscript
branch of job’s settings. In general the full runscript has the following form:[first line defined by job.settings.runscript.shebang] [contents of job.settings.runscript.pre, if any] [value returned by get_runscript()] [contents of job.settings.runscript.post, if any]
When overridden, this method should pay attention to
.runscript.stdout_redirect
key in job’ssettings
.
-
hash
()[source]¶ Calculate unique hash of this instance.
The behavior of this method is adjusted by the value of
hashing
key inJobManager
settings. If noJobManager
is yet associated with this job, default setting fromconfig.jobmanager.hashing
is used.Methods
hash_input()
andhash_runscript()
are used to obtain hashes of, respectively, input and runscript.Currently supported values for
hashing
are:False
orNone
– returnsNone
and disables Rerun prevention.input
– returns hash of the input file.runscript
– returns hash of the runscript.input+runscript
– returns SHA256 hash of the concatenation of hashes of input and runscript.
-
_full_runscript
()[source]¶ Generate full runscript, including shebang line and contents of
pre
andpost
, if any.Technical
In practice this method is just a wrapper around
get_runscript()
.
-
_get_ready
()[source]¶ Generate input and runscript files in the job folder. Methods
get_input()
andget_runscript()
are used for that purpose.
-
3.2.4.1. Subclassing SingleJob¶
SingleJob
class was designed in a way that makes subclassing it quick and easy. Thanks to that it takes very little effort to create PLAMS interface for a new external binary.
Your new class has to, of course, be a subclass of SingleJob
and define methods get_input()
and get_runscript()
:
>>> class MyJob(SingleJob):
>>> def get_input(self):
>>> ...
>>> return 'string with input file'
>>> def get_runscript(self):
>>> ...
>>> return 'string with runscript'
Note
get_runscript()
method should properly handle output redirection based on the value of myjob.settings.runscript.stdout_redirect
. When False
, no redirection should occur inside runscript. If True
, runscript should be constructed in such a way that all standard output is redirected (using >
) to the proper file (its name is “visible” as self._filename('out')
from inside get_runscript()
body).
This is sufficient for your new job to work properly with other PLAMS components. However, there are other useful attributes and methods that can be overridden:
check()
– the default version of this method defined inJob
always returnsTrue
and hence effectively disables correctness checking. If you wish to enable checking for your new class, you need to definecheck()
method in it, just likeget_input()
andget_runscript()
in the example above. It should take no other arguments thanself
and return a boolean value indicating if job execution was successful. This method is privileged to have an early access toResults
methods in exactly the same way aspostrun()
.if you wish to create a special
Results
subclass for results of your new job, make sure to let it know about it:>>> class MyResults(Results): >>> def some_method(self, ...): >>> ... >>> >>> class MyJob(SingleJob): >>> _result_type = MyResults >>> def get_input(self): >>> ... >>> return 'string with input file' >>> def get_runscript(self): >>> ... >>> return 'string with runscript'
hash_input()
andhash_runscript()
– see Rerun prevention for detailsif your new job requires some special preparations regarding input or runscript files these preparations can be done for example in
prerun()
. However, if you wish to leaveprerun()
clean for further subclassing or adjusting in instance-based fashion, you can use another method called_get_ready()
. This method is responsible for input and runscript creation, so if you decide to override it you must call its parent version in your version:>>> def _get_ready(self): >>> # do some stuff >>> SingleJob._get_ready() >>> # do some other stuff
Warning
Whenever you are subclassing any kind of job, either single of multi, and you wish to override its constructor (__init__
method) it is absolutely essential to call the parent constructor and pass all unused keyword arguments to it:
>>> class MyJob(SomeOtherJob):
>>> def __init__(self, myarg1, myarg2=default2, **kwargs):
>>> SomeOtherJob.__init__(self, **kwargs)
>>> # do stuff with myarg1 and myarg2
3.2.5. Multijobs¶
-
class
MultiJob
(children=None, name='plamsjob', settings=None, depend=None)[source]¶ Concrete class representing a job that is a container for other jobs.
In addition to constructor arguments and attributes defined by
Job
, the constructor of this class accepts two keyword arguments:children
– should be a list (or other iterable container) containing children jobs.childrunner
– by default all the children jobs are run using the sameJobRunner
as the parent job. If you wish to use a differentJobRunner
for children, you can pass it using this argument.
Values passed as
children
andchildrunner
are stored as instance attributes and can be adjusted later, but before therun()
method is called.This class defines no new methods that could be directly called in your script.
When executed, a multijob runs all its children using the same
run()
arguments. If you need to specify different run flags for children you can do it by manually setting them in children jobSettings
:>>> childjob.settings.run.arg = 'value'
Since
run
branch of settings gets soft-updated by run flags, value set this way is not overwritten by parent job.Job folder of a multijob gets cleaned independently of its children. See Cleaning job folder for details.
-
new_children
()[source]¶ Generate new children jobs.
This method is useful when some of children jobs are not known beforehand and need to be generated based on other children jobs, like for example in any kind of self-consistent procedure.
The goal of this method is to produce a new portion of children jobs. Newly created jobs should be returned in a container compatible with
self.children
(e.g. list for list, dict for dict). No adjustment of newly created jobs’parent
attribute is needed. This method cannot modify_active_children
attribute.The method defined here is a default template, returning
None
, which means no new children jobs are generated and the entire execution of the parent job consists only of running jobs initially found inself.children
. To modify this behavior you can override this method inMultiJob
subclass or use one of Binding decorators, just like with Prerun and postrun methods.
-
check
()[source]¶ Check if the calculation was successful. Returns
True
if every children job has itsstatus
attribute set to'successful'
.
-
_get_ready
()[source]¶ Get ready for
_execute()
. Count children jobs and set theirparent
attribute.
-
_notify
()[source]¶ Notify this job that one of its children has finished.
Decrement
_active_children
by one. Use_lock
to ensure thread safety.
-
_execute
(jobrunner)[source]¶ Run all children from
children
. Then usenew_children()
and run all jobs produced by it. Repeat this procedure untilnew_children()
returns an empty list. Wait for all started jobs to finish.
3.2.5.1. Using MultiJob¶
Since MultiJob
is a concrete class, it can be used in two ways: either by creating instances of it or subclassing it. The simplest application is just to use an instance of MultiJob
as a container grouping similar jobs that you wish to run at the same time using the same job runner:
>>> mj = MultiJob(name='somejobs', children=[job1, job2, job3])
>>> mj.children.append(job4)
>>> mj.run(...)
You can of course use it together with Prerun and postrun methods to further customize the behavior of mj
.
More flexible way of using multijobs is subclassing. You can subclass directly from MultiJob
or from any of its subclasses. Defining your own multijob is the best solution when you need to run many similar jobs and later compare their results. In that case prerun()
method can be used for populating children
and postrun()
for extracting results and merging them.