2.1. Conformers¶
There are four types of conformer classes, which differ in the approach to duplicate recognition,
and all have largely the same interface.
A description of the full interface is provided below for the UniqueConformersCrest
class,
followed by more abbreviated descriptions of the UniqueConformersRMSD
, UniqueConformersTFD
, and UniqueConformersAMS
classes.
Overall, we recommend the UniqueConformersCrest
, for its good accuracy/efficiency ratio, and its ability to find and store rotamers.
The UniqueConformersAMS
class is slow in filtering out duplicates for large symmetric molecular systems,
but is a good choice is conformer sets need to be compared and clustered.
The UniqueConformersRMSD
class is only able to filter out the most obvious duplicates.
It is not able to identify duplicates if they involve symmetric images (e.g. rotations around methyl groups).
2.1.1. UniqueConformersCrest¶
A class holding the conformers of a molecule, using CREST duplicate recognition to filter out duplicates.
-
class
UniqueConformersCrest
(energy_threshold=0.05, rmsd_threshold=0.125, bconst_threshold=0.01)¶ Class representing a set of unique conformers
An instance of this class has the following attributes:
molecule
– A PLAMS molecule object defining the connection data of the moleculegeometries
– A list containing the coordinates of all conformers in the setenergies
– A list containing the energies of all conformers in the setrotamers
– A list withUniqueConformersCrest
objects representing the rotamer-set for each conformercheck_for_duplicates
– Only accept new conformer if candidate is not a duplicate (if False, there is still a check for isomers and bond changes)accept_isomers
– Don’t reject isomers (default is to reject them)accept_all
– Accept any candidate in the set without checksgenerator
– A conformer generator object. Has to be set withset_generator()
. The default generator is of theCRESTGenerator
type.
A simple example of (parallel) use:
>>> from scm.plams import Molecule >>> from scm.plams import init, finish >>> from scm.conformers import UniqueConformersCrest >>> # Set up the molecular data >>> mol = Molecule('mol.xyz') >>> conformers = UniqueConformersCrest() >>> conformers.prepare_state(mol) >>> # Set up PLAMS settings >>> init() >>> # Create the generator and run >>> conformers.generate(nproc=1, maxjobs=12) >>> finish() >>> # Write the results to file >>> print(conformers) >>> conformers.write()
Note
The default generator for this conformer class is the
CRESTGenerator
, using the GFN1-xTB engine. This will generally take a lot of time. To speed things up, set a generator with a different engine prior to runninggenerate()
:>>> engine = Settings() >>> engine.ForceField.Type = 'UFF' >>> conformers.set_generator(method='crest', engine_settings=engine, nproc=1, maxjobs=12)
-
__init__
(energy_threshold=0.05, rmsd_threshold=0.125, bconst_threshold=0.01)¶ Creates an instance of the conformer class
energy_threshold
– The energy difference above which conformers are always considered unique (kcal/mol).rmsd_threshold
– RMSD below which conformers are considered duplicates Angstrom.bconst_threshold
– Relative rotational constant used to determine if conformers are unique or not.
-
add_conformer
(coords, energy, reorder=True)¶ Adds the new coordinates to the list of conformers, if they are not duplicates
coords
– A coordinate array for the candidate conformerenergy
– The energy of the candidate conformerreorder
– Boolean specifying if the conformers should be ordered based on energy after addition of candidate
Note
If the conformer is not unique, this returns the index of its duplicate. If it is unique, this returns None.
-
set_generator
(method='crest', engine_settings=None, nproc=1, max_energy=6.0, maxjobs=1)¶ Store a generator object
Note
Overwrites previous generator object
method
– A string, and one of the following options [‘crest’, ‘rdkit’]engine_settings
– PLAMS Settings object:>>> engine_settings = Settings() >>> engine_settings.DFTB.Model = 'GFN1-xTB'
nproc
– Number of processors used for each single call to AMSmax_energy
– Maximum accepted energy difference from lowest energy conformermaxjobs
– Maximum number of parallel AMS processes
-
generate
(method='crest', nproc=1, maxjobs=1)¶ Generate conformers using the specified method
method
– A string, and one of the following options [‘crest’, ‘rdkit’]nproc
– Number of processors used for each single call to AMS (only used if set_generator was not called)maxjobs
– Maximum number of parallel AMS processes ((only used if set_generator was not called))
Note
Adjusts self
-
optimize
(convergence_level, optimizer=None, max_energy=None, engine_settings=None, nproc=1, maxjobs=1, name='go', verbose=False)¶ (Re)-Optimize the conformers currently in the set
convergence_level
– One of the convergence options (‘tight’, ‘vtight’, ‘loose’, etc’)optimizer
– Instance of the ConformerOptimizer class. If not provided, an engine_settings object is required.engine_settings
– PLAMS Settings object:>>> engine_settings = Settings() >>> engine_settings.DFTB.Model = 'GFN1-xTB'
-
get_diffs_for_candidate
(coords, energy, iconf=None)¶ Find out how much the values in the candidate molecule differ from each conformer
coords
– Coordinate array for the candidate conformerenergy
– Energy of the candidate conformer (kcal/mol)iconf
– Optional: A single conformer index to compare the candidate with (default is to compare to all)
-
read
(dirname, name='crest', enfilename=None, reorder=True, filetype=None)¶ Read a conformer set from the specified directory
dirname
– The directory name containing the conformer filename
– The name of the conformer fileenfilename
– Optionally the name of a file containing the conformer energies (default: energies_name.txt)reorder
– Boolean specifying if the conformers need to be reordered based on energyfiletype
– Extension of the conformer file (‘dcd’, ‘rkf’, ‘xyz’). If not provided, it is determined from the extensions of files in dirname
-
write
(dirname='.', name='crest', write_rotamers=False, filetype='dcd')¶ Write the conformers to file
write_rotamers
– Boolean specifying if the rotamers of the conformers shouldk be written to filesname
– The name of the conformer filedirname
– The directory name containing the conformer filefiletype
– Extension of the conformer file (‘dcd’ (default) or ‘rkf’).
-
clear
()¶ Remove all conformers
-
copy
()¶ Copy the conformer set
-
filter
(max_energy=None)¶ Filter all conformers again, possibly with a maximum allowed (relative) energy
-
find_clusters
(dist=5.0, criterion='maxclust', method='average', indices=None)¶ Assign all conformers to clusters
dist
– Either the max number of clusters (for maxclust), or the maximum distance between clusters (for distance)criterion
– Determines how many clusters to make (maxclust or distance).indices
– A tuple with as elements lists of indices for subsets of conformers
Note
Uses scipy’s fcluster method
-
find_nth_conformer
(i)¶ Find the the index of the n-th conformer added (indices start at 0)
-
fit
()¶ Fit all conformers onto the first one in the set, and resave
-
classmethod
from_rdkitmol
(rdmol, energies=None, reorder=True)¶ Get all the conformers from the RDKit molecule
-
get_all_energies
()¶ Get all the energies in the set
-
get_all_geometries
()¶ Get all the geometries in the set
-
get_all_rmsds
()¶ Get the RMSD value from the lowest energy conformer for all conformers
-
get_conformers
()¶ Returns the conformers as a list of molecules
-
get_dendrogram
(method='average')¶ Gets a dendrogram reflecting the distances between conformers
Note
Uses scipy’s fcluster method
-
get_energies
()¶ Returns the energies in reference to the most stable
-
get_molecule
(i)¶ Return a molecule object for conformer i
-
get_plot_dendrogram
(dend, names=None, fontsize=4)¶ Makes a plot of the dendrogram
-
get_rdkitmol
()¶ Convert to RDKit molecule
-
get_rmsds_from_frame
(frame)¶ Get all RMSDs from a certain frame
-
indices_to_names
(indices1, indices2, name1='a', name2='b')¶ Convert two sets of indices to names for the conformers in self
Note
- Mostly for use related to clustering features
- Only works with two sets of indices.
- All indices need to be represented by these two lists
-
prepare_state
(mol)¶ Set up all the molecule data
-
remove_conformer
(index)¶ Remove a conformer from the set
-
remove_high_energy
(max_energy)¶ Remove all high energy conformers
-
remove_non_minima
(save_rejected_to_file=False, rejected_filename='rejected_non_minima_conformers.xyz')¶ Perform PES point characterizations for all conformers and remove the ones that are not local minima If save_rejected_to_file is true, rejected non-minimum conformers are saved to the file rejected_filename
-
reorder
()¶ Reorder conformers from smallest to largest energy
-
rmsds
¶ Get the RMSD value from the lowest energy conformer for all conformers
-
set_energies
(energies)¶ Set the energies of the conformers
2.1.2. UniqueConformersTFD¶
A class holding the conformers of a molecule, using the torsion fingerprint difference distance (TFD) to recognize and filter out duplicates.
-
class
UniqueConformersTFD
(tfd_threshold=0.05)¶ Class representing a set of unique conformers
An instance of this class has the following attributes:
molecule
– A PLAMS molecule object defining the connection data of the moleculerdmol
– RDKit molecule object without conformersgeometries
– A list containing the coordinates of all conformers in the setenergies
– A list containing the energies of all conformers in the setcheck_for_duplicates
– Only accept new conformer if candidate is not a duplicate (if False, there is still a check for isomers and bond changes)accept_isomers
– Don’t reject isomers (default is to reject them)accept_all
– Accept any candidate in the set without checksgenerator
– A conformer generator object. Has to be set withset_generator()
. The default generator is of the RDKitGenerator type.
A simple example of (parallel) use:
>>> from scm.plams import Molecule >>> from scm.plams import init, finish >>> from scm.conformers import UniqueConformersTFD >>> # Set up the molecular data >>> mol = Molecule('mol.xyz') >>> conformers = UniqueConformersTFD() >>> conformers.prepare_state(mol) >>> # Set up PLAMS settings >>> init() >>> # Create the generator and run >>> conformers.generate(nproc=1, maxjobs=12) >>> finish() >>> # Write the results to file >>> print(conformers) >>> conformers.write()
Note
The default generator for this conformer class is the RDKitGenerator, using the GFN1-xTB engine. This will generally take a lot of time. To speed things up, set a different generator prior to runnung
generate()
:>>> engine = Settings() >>> engine.ForceField.Type = 'UFF' >>> conformers.set_generator(method='rdkit', engine_settings=engine, nproc=1, maxjobs=12)
-
__init__
(tfd_threshold=0.05)¶ Creates an instance of the conformer class
tfd_threshold
– Torsion Fingerprint (unitless)
-
prepare_state
(mol)¶ Set up all the molecule data
mol
– PLAMS Molecule object
-
add_conformer
(coords, energy, reorder=True)¶ Adds the new coordinates to the list of conformers, if they are not duplicates
coords
– A coordinate array for the candidate conformerenergy
– The energy of the candidate conformerreorder
– Boolean specifying if the conformers should be ordered based on energy after addition of candidate
Note
If the conformer is not unique, this returns the index of its duplicate. If it is unique, this returns None.
-
set_generator
(method='rdkit', engine_settings=None, nproc=1, max_energy=6.0, maxjobs=1)¶ Store a generator object
method
– A string, and one of the following options [‘crest’, ‘rdkit’]engine_settings
– PLAMS Settings object:>>> engine_settings = Settings() >>> engine_settings.DFTB.Model = 'GFN1-xTB'
nproc
– Number of processors used for each single call to AMSmax_energy
– Maximum accepted energy difference from lowest energy conformermaxjobs
– Maximum number of parallel AMS processes
Note
Overwrites previous generator object
-
generate
(method='rdkit', nproc=1, maxjobs=1)¶ Generate conformers using the specified method
method
– A string, and one of the following options [‘crest’, ‘rdkit’]nproc
– Number of processors used for each single call to AMS (only used if set_generator was not called)maxjobs
– Maximum number of parallel AMS processes ((only used if set_generator was not called))
Note
Adjusts self
-
get_diffs_for_candidate
(coords, energy, iconf=None)¶ Find out how much the values in the candidate molecule differ from each conformer
coords
– Coordinate array for the candidate conformerenergy
– Energy of the candidate conformer (kcal/mol)iconf
– Optional: A single conformer index to compare the candidate with (default is to compare to all)
-
read
(dirname, name='tfd', enfilename=None, reorder=True, filetype='dcd')¶ Read a conformer set from the specified directory
dirname
– The directory name containing the conformer filename
– The name of the conformer fileenfilename
– Optionally the name of a file containing the conformer energies (default: energies_name.txt)reorder
– Boolean specifying if the conformers need to be reordered based on energyfiletype
– Extension of the conformer file (‘dcd’ (default) or ‘rkf’).
-
write
(dirname='.', name='tfd', filetype='dcd')¶ Write the conformers to file
name
– The name of the conformer filedirname
– The directory name containing the conformer filefiletype
– Extension of the conformer file (‘dcd’ (default) or ‘rkf’).
-
get_torsion_atoms
()¶ Returns all the torsion atoms involved in the TFD
Note
Each contribution is a list of sets of four atoms. Mostly the list has only one entry, but in case of symmetry, more sets of 4 atoms can contribute to a single torsion value.
-
get_torsion_values
(iconf)¶ Get the values of all the torsion angles for this conformer
Note
Each contribution is a list of torion angles. Mostly the list has only one entry, but in the case of symmetry, or rings, several torsion angles contribute to a single TFP value.
2.1.3. UniqueConformersRMSD¶
A class holding the conformers of a molecule, using only RMSD to recognize and filter out duplicates.
-
class
UniqueConformersRMSD
(energy_threshold=0.05, rmsd_threshold=0.125)¶ Class representing a set of unique conformers
An instance of this class has the following attributes:
molecule
– A PLAMS molecule object defining the connection data of the moleculegeometries
– A list containing the coordinates of all conformers in the setenergies
– A list containing the energies of all conformers in the setcheck_for_duplicates
– Only accept new conformer if candidate is not a duplicate (if False, there is still a check for isomers and bond changes)accept_isomers
– Don’t reject isomers (default is to reject them)accept_all
– Accept any candidate in the set without checksgenerator
– A conformer generator object. Has to be set withset_generator()
. The default generator is of theCRESTGenerator
type.
A simple example of (parallel) use:
>>> from scm.plams import Molecule >>> from scm.plams import init, finish >>> from scm.conformers import UniqueConformersRMSD >>> # Set up the molecular data >>> mol = Molecule('mol.xyz') >>> conformers = UniqueConformersRMSD() >>> conformers.prepare_state(mol) >>> # Set up PLAMS settings >>> init() >>> # Create the generator and run >>> conformers.generate(nproc=1, maxjobs=12) >>> finish() >>> # Write the results to file >>> print(conformers) >>> conformers.write()
Note
The default generator for this conformer class is the
RDKitGenerator
, using the UFF engine.-
__init__
(energy_threshold=0.05, rmsd_threshold=0.125)¶ Creates an instance of the conformer class
energy_threshold
– The energy difference above which conformers are always considered unique (kcal/mol).rmsd_threshold
– RMSD below which conformers are considered duplicates Angstrom.
-
add_conformer
(coords, energy, reorder=True)¶ Adds the new coordinates to the list of conformers, if they are not duplicates
coords
– A coordinate array for the candidate conformerenergy
– The energy of the candidate conformerreorder
– Boolean specifying if the conformers should be ordered based on energy after addition of candidate
Note
If the conformer is not unique, this returns the index of its duplicate. If it is unique, this returns None.
-
set_generator
(method='rdkit', engine_settings=None, nproc=1, max_energy=6.0, maxjobs=1)¶ Store a generator object
Note
Overwrites previous generator object
method
– A string, and one of the following options [‘crest’, ‘rdkit’]engine_settings
– PLAMS Settings object:>>> engine_settings = Settings() >>> engine_settings.DFTB.Model = 'GFN1-xTB'
nproc
– Number of processors used for each single call to AMSmax_energy
– Maximum accepted energy difference from lowest energy conformermaxjobs
– Maximum number of parallel AMS processes
-
generate
(method='rdkit', nproc=1, maxjobs=1)¶ Generate conformers using the specified method
method
– A string, and one of the following options [‘crest’, ‘rdkit’]nproc
– Number of processors used for each single call to AMS (only used if set_generator was not called)maxjobs
– Maximum number of parallel AMS processes ((only used if set_generator was not called))
Note
Adjusts self
-
optimize
(convergence_level, optimizer=None, max_energy=None, engine_settings=None, nproc=1, maxjobs=1, name='go', verbose=False)¶ (Re)-Optimize the conformers currently in the set
convergence_level
– One of the convergence options (‘tight’, ‘vtight’, ‘loose’, etc’)optimizer
– Instance of the ConformerOptimizer class. If not provided, an engine_settings object is required.engine_settings
– PLAMS Settings object:>>> engine_settings = Settings() >>> engine_settings.DFTB.Model = 'GFN1-xTB'
-
get_diffs_for_candidate
(coords, energy, iconf=None)¶ Find out how much the values in the candidate molecule differ from each conformer
coords
– Coordinate array for the candidate conformerenergy
– Energy of the candidate conformer (kcal/mol)iconf
– Optional: A single conformer index to compare the candidate with (default is to compare to all)
-
read
(dirname, name='rmsd', enfilename=None, reorder=True, filetype=None)¶ Read a conformer set from the specified directory
dirname
– The directory name containing the conformer filename
– The name of the conformer fileenfilename
– Optionally the name of a file containing the conformer energies (default: energies_name.txt)reorder
– Boolean specifying if the conformers need to be reordered based on energyfiletype
– Extension of the conformer file (‘dcd’, ‘rkf’, ‘xyz’). If not provided, it is determined from the extensions of files in dirname
-
write
(dirname='.', name='rmsd', filetype='dcd')¶ Write the conformers to file
write_rotamers
– Boolean specifying if the rotamers of the conformers shouldk be written to filesname
– The name of the conformer filedirname
– The directory name containing the conformer filefiletype
– Extension of the conformer file (‘dcd’ (default) or ‘rkf’).
2.1.4. UniqueConformersAMS¶
A class holding the conformers of a molecule, using distance matrices and torsion angles to recognize and filter out duplicates.
-
class
UniqueConformersAMS
(energy_threshold=0.2, min_dihed=30, min_dist=0.1)¶ Class representing a set of unique conformers
An instance of this class has the following attributes:
molecule
– A PLAMS molecule object defining the connection data of the moleculegeometries
– A list containing the coordinates of all conformers in the setenergies
– A list containing the energies of all conformers in the setrotamers
– A list withUniqueConformersAMS
objects representing the rotamer-set for each conformercheck_for_duplicates
– Only accept new conformer if candidate is not a duplicate (if False, there is still a check for isomers and bond changes)accept_isomers
– Don’t reject isomers (default is to reject them)accept_all
– Accept any candidate in the set without checksgenerator
– A conformer generator object. Has to be set withset_generator()
. The default generator is of the CrestGenerator type.
A simple example of (parallel) use:
>>> from scm.plams import Molecule >>> from scm.plams import init, finish >>> from scm.conformers import UniqueConformersAMS >>> # Set up the molecular data >>> mol = Molecule('mol.xyz') >>> conformers = UniqueConformersAMS() >>> conformers.prepare_state(mol) >>> # Set up PLAMS settings >>> init() >>> # Create the generator and run >>> conformers.generate(nproc=1, maxjobs=12) >>> finish() >>> # Write the results to file >>> print(conformers) >>> conformers.write()
The default generator for this conformer class is the
RDKitGenerator
. A list of all possibe generators:- RDKitGenerator
- TorsionGenerator
- CrestGenerator
By default the
RDKitGenerator
uses the UFF engine. To select a different engine, set a different generator prior to runninggenerate()
:>>> engine = Settings() >>> engine.ForceField.Type = 'UFF' >>> conformers.set_generator(method='rdkit', engine_settings=engine, nproc=1, maxjobs=12)
The
RDKitGenerator
first uses RDKit to generate an initial set of conformer geometries. These are then subjected to geometry optimization using an AMS engine, after which duplicates are filtered out. By default, theRDKitGenerator
determines the number of initial conformers based on the number of rotatable bonds in the system. For a large molecule, this will result in a very large number of conformers. To set the number of initial conformers by hand, use:>>> conformers.set_generator(method='rdkit', nproc=1, maxjobs=12) >>> conformers.generator.set_number_initial_conformers(100) >>> print ('Initial number of conformers: ',conformers.generator.ngeoms)
-
__init__
(energy_threshold=0.2, min_dihed=30, min_dist=0.1)¶ Creates an instance of the conformer class
energy_threshold
– The energy difference above which conformers are always considered unique (kcal/mol).min_dist
– Maximum difference a distance between two atoms can have for a conformer to be considered a duplicate.min_dihed
– Maximum difference a dihedral can have for a conformer to be considered a duplicate.
-
prepare_state
(mol, atoms_to_remove=None)¶ Set up all the molecule data
mol
– A PLAMS Molecule objectatoms_to_remove
– Optional: A list of atoms to be removed from the distance matrices (default is all H)
-
add_conformer
(coords, energy, reorder=True)¶ Adds the new coordinates to the list of conformers, if they are not duplicates
coords
– A coordinate array for the candidate conformerenergy
– The energy of the candidate conformerreorder
– Boolean specifying if the conformers should be ordered based on energy after addition of candidate
Note
If the conformer is not unique, this method returns the index of its duplicate. If it is unique, this returns None.
-
set_generator
(method='rdkit', engine_settings=None, nproc=1, max_energy=6.0, maxjobs=1)¶ Store a generator object, to be used by the
generate()
methodmethod
– A string, and one of the following options [‘crest’, ‘rdkit’]engine_settings
– PLAMS Settings object:>>> engine_settings = Settings() >>> engine_settings.DFTB.Model = 'GFN1-xTB'
nproc
– Number of processors used for each single call to AMSmax_energy
– Maximum accepted energy difference from lowest energy conformermaxjobs
– Maximum number of parallel AMS processes
Note
Overwrites previously used generator object.
-
generate
(method='rdkit', nproc=1, maxjobs=1)¶ Generate conformers using the specified method
method
– A string, and one of the following options [‘crest’, ‘rdkit’]nproc
– Number of processors used for each single call to AMS (only used if set_generator was not called)maxjobs
– Maximum number of parallel AMS processes ((only used if set_generator was not called))
Note
Adjusts self.
Note
If a generator was set previously with the
set_generator()
method, no arguments are required
-
get_diffs_for_candidate
(coords, energy=0.0, iconf=None)¶ Find out how much the values in the candidate molecule differ from each conformer
coords
– Coordinate array for the candidate conformerenergy
– Energy of the candidate conformer (kcal/mol)iconf
– Optional: A single conformer index to compare the candidate with (default is to compare to all)
-
read
(dirname, name='ams', enfilename=None, reorder=True, filetype=None)¶ Read a conformer set from the specified directory
dirname
– The directory name containing the conformer filename
– The name of the conformer fileenfilename
– Optionally the name of a file containing the conformer energies (default: energies_name.txt)reorder
– Boolean specifying if the conformers need to be reordered based on energyfiletype
– Extension of the conformer file (‘dcd’, ‘rkf’, ‘xyz’). If not provided it is determined from extensions of files in dirname
-
write
(dirname='.', name='ams', write_rotamers=False, filetype='dcd')¶ Write the conformers to file
write_rotamers
– Boolean specifying if the rotamers of the conformers shouldk be written to filesname
– The name of the conformer filedirname
– The directory name containing the conformer filefiletype
– Extension of the conformer file (‘dcd’ (default) or ‘rkf’).