RDKit interface¶
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. PLAMS interface to RDKit originates from QMFlows project and features functions for generating PLAMS molecules from string representations (SMARTS, SMILES) as well as a handful of tools for dealing with proteins and PDB files.
-
add_Hs
(mol, forcefield=None, return_rdmol=False)[source]¶ Add hydrogens to protein molecules read from PDB. Makes sure that the hydrogens get the correct PDBResidue info.
- Parameters
- Returns
A molecule with explicit hydrogens added
- Return type
Molecule
or rdkit.Chem.Mol
-
apply_reaction_smarts
(mol, reaction_smarts, complete=False, forcefield=None, return_rdmol=False)[source]¶ Applies reaction smirks and returns product. If returned as a PLAMS molecule, thismolecule.properties.orig_atoms is a list of indices of atoms that have not been changed (which can for example be used partially optimize new atoms only with the freeze keyword)
- Parameters
mol (
Molecule
or rdkit.Chem.Mol) – molecule to be modifiedreactions_smarts (str) – Reactions smarts to be applied to molecule
complete (bool or float (value between 0 and 1)) – Apply reaction until no further changes occur or given fraction of reaction centers have been modified
forcefield (str) – Specify ‘uff’ or ‘mmff’ to apply forcefield based geometry optimization of product structures.
return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule
- Returns
(product molecule, list of unchanged atoms)
- Return type
(
Molecule
, list of int)
-
apply_template
(mol, template)[source]¶ Modifies bond orders in PLAMS molecule according template smiles structure.
-
get_backbone_atoms
(mol)[source]¶ Return a list of atom indices corresponding to the backbone atoms in a peptide molecule. This function assumes PDB information in properties.pdb_info of each atom, which is the case if the molecule is generated with the “readpdb” or “from_sequence” functions.
-
modify_atom
(mol, idx, element)[source]¶ Change atom “idx” in molecule “mol” to “element” and add or remove hydrogens accordingly
-
to_rdmol
(plams_mol, sanitize=True, properties=True, assignChirality=False)[source]¶ Translate a PLAMS molecule into an RDKit molecule type. PLAMS
Molecule
,Atom
orBond
properties are pickled if they are neither booleans, floats, integers, floats nor strings, the resulting property names are appended with ‘_pickled’.- Parameters
plams_mol (
Molecule
) – A PLAMS moleculesanitize (bool) – Kekulize, check valencies, set aromaticity, conjugation and hybridization
properties (bool) – If all
Molecule
,Atom
andBond
properties should be converted from PLAMS to RDKit format.assignChirality (bool) – Assign R/S and cis/trans information, insofar as this was not yet present in the PLAMS molecule.
- Returns
an RDKit molecule
- Return type
rdkit.Chem.Mol
-
from_rdmol
(rdkit_mol, confid=- 1, properties=True)[source]¶ Translate an RDKit molecule into a PLAMS molecule type. RDKit properties will be unpickled if their name ends with ‘_pickled’.
-
from_sequence
(sequence, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule from a peptide sequence. Includes explicit hydrogens and 3D coordinates.
- Parameters
sequence (str) – A peptide sequence, e.g. ‘HAG’
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- Returns
A peptide molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Molecule
or list of PLAMS Molecules
-
from_smiles
(smiles, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule(s) from a smiles strings.
- Parameters
smiles (str) – A smiles string
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations
- Returns
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Molecule
or list of PLAMS Molecules
-
from_smarts
(smarts, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule(s) from a smarts strings. This allows for example to define hydrogens explicitly. However it is less suitable for aromatic molecules (use from_smiles in that case).
- Parameters
smarts (str) – A smarts string
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- Returns
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Molecule
or list of PLAMS Molecules
-
to_smiles
(plams_mol, short_smiles=True, **kwargs)[source]¶ Returns the RDKit-generated SMILES string of a PLAMS molecule.
Note: SMILES strings are generated based on the molecule’s connectivity. If the input PLAMS molecule does not contain any bonds, “guessed bonds” will be used.
- Parameters
plams_mol – A PLAMS
Molecule
short_smiles (bool) – whether or not to use some RDKit sanitization to get shorter smiles (e.g. for a water molecule, short_smiles=True -> “O”, short_smiles=False -> [H]O[H])
**kwargs – With ‘kwargs’ you can provide extra optional parameters to the rdkit.Chem method ‘MolToSmiles’. See the rdkit documentation for more info.
- Returns
the SMILES string
-
partition_protein
(mol, residue_bonds=None, split_heteroatoms=True, return_rdmol=False)[source]¶ Splits a protein molecule into capped amino acid fragments and caps.
- Parameters
mol (
Molecule
or rdkit.Chem.Mol) – A protein moleculeresidue_bonds (tuple) – a tuple of pairs of residue number indicating which peptide bonds to split. If none, split all peptide bonds.
split_heteroatoms (bool) – if True, all bonds between a heteroatom and a non-heteroatom across residues are removed
- Returns
list of fragments, list of caps
-
readpdb
(pdb_file, sanitize=True, removeHs=False, proximityBonding=False, return_rdmol=False)[source]¶ Generate a molecule from a PDB file
-
writepdb
(mol, pdb_file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Write a PDB file from a molecule
- Parameters
mol (
Molecule
or rdkit.Chem.Mol) – molecule to be exported to PDBpdb_file (path- or file-like) – The PDB file to write to, or a filename
-
get_substructure
(mol, func_list)[source]¶ Search for functional groups within a molecule based on a list of reference functional groups. SMILES strings, PLAMS and/or RDKit molecules can be used interchangeably in “func_list”.
Example:
>>> mol = from_smiles('OCCO') # Ethylene glycol >>> func_list = ['[H]O', 'C[N+]', 'O=PO'] >>> get_substructure(mol, func_list) {'[H]O': [(<scm.plams.mol.atom.Atom at 0x125183518>, <scm.plams.mol.atom.Atom at 0x1251836a0>), (<scm.plams.mol.atom.Atom at 0x125183550>, <scm.plams.mol.atom.Atom at 0x125183240>)]}
- Parameters
- Returns
A dictionary with functional groups from “func_list” as keys and a list of n-tuples with matching PLAMS
Atom
as values.
-
get_conformations
(mol, nconfs=1, name=None, forcefield=None, rms=- 1, enforceChirality=False, useExpTorsionAnglePrefs='default', constraint_ats=None, EmbedParameters='EmbedParameters', randomSeed=1, best_rms=- 1)[source]¶ Generates 3D conformation(s) for an rdkit_mol or a PLAMS Molecule
- Parameters
mol (rdkit.Chem.Mol or
Molecule
) – RDKit or PLAMS Moleculenconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
best_rms (float) – Root Mean Square deviation of best atomic permutation for removing similar/equivalent conformations.
enforceChirality (bool) – Enforce the correct chirality if chiral centers are present
useExpTorsionAnglePrefs (str) – Use experimental torsion angles preferences for the conformer generation by rdkit
constraint_ats (list) – List of atom indices to be constrained
EmbedParameters (str) – Name of RDKit EmbedParameters class (‘EmbedParameters’, ‘ETKDG’)
randomSeed (int) – The seed for the random number generator. If set to None the generated conformers will be non-deterministic.
- Returns
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Molecule
or list of PLAMS Molecules
-
yield_coords
(rdmol, id=- 1)[source]¶ Take an rdkit molecule and yield its coordinates as 3-tuples.
>>> from scm.plams import yield_coords >>> from rdkit import Chem >>> rdmol = Chem.Mol(...) # e.g. Methane >>> for xyz in yield_coords(rdmol): ... print(xyz) (-0.0, -0.0, -0.0) (0.6405, 0.6405, -0.6405) (0.6405, -0.6405, 0.6405) (-0.6405, 0.6405, 0.6405) (-0.6405, -0.6405, -0.6405)
The iterator produced by this function can, for example, be passed to
Molecule.from_array()
the update the coordinates of a PLAMS Molecule in-place.>>> from scm.plams import Molecule >>> mol = Molecule(...) >>> xyz_iterator = yield_coords(rdmol) >>> mol.from_array(xyz_iterator)
- Parameters
rdmol (rdkit.Chem.Mol) – An RDKit mol.
id (int) – The ID of the desired conformer.
- Returns
An iterator yielding 3-tuples with rdmol’s Cartesian coordinates.
- Return type
iterator
-
canonicalize_mol
(mol, inplace=False, **kwargs)[source]¶ Take a PLAMS molecule and sort its atoms based on their canonical rank.
Example:
>>> from scm.plams import Molecule, canonicalize_mol # Methane >>> mol: Molecule = ... >>> print(mol) Atoms: 1 H 0.640510 0.640510 -0.640510 2 H 0.640510 -0.640510 0.640510 3 C 0.000000 0.000000 0.000000 4 H -0.640510 0.640510 0.640510 5 H -0.640510 -0.640510 -0.640510 >>> print(canonicalize_mol(mol)) Atoms: 1 C 0.000000 0.000000 0.000000 2 H -0.640510 -0.640510 -0.640510 3 H -0.640510 0.640510 0.640510 4 H 0.640510 -0.640510 0.640510 5 H 0.640510 0.640510 -0.640510
- Parameters
mol (
Molecule
) – The to-be canonicalized molecule.inplace (bool) – Whether to sort the atoms inplace or to return a new molecule.
**kwargs – Further keyword arguments for rdkit.Chem.CanonicalRankAtoms.
- Returns
Either
None
or a newly sorted molecule, depending on the value ofinplace
.- Return type
None or
Molecule