RDKit interface

RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. PLAMS interface to RDKit originates from QMFlows project and features functions for generating PLAMS molecules from string representations (SMARTS, SMILES) as well as a handful of tools for dealing with proteins and PDB files.

add_Hs(mol, forcefield=None, return_rdmol=False)[source]

Add hydrogens to protein molecules read from PDB. Makes sure that the hydrogens get the correct PDBResidue info.

Parameters:
  • mol (Molecule or rdkit.Chem.Mol) – Molecule to be protonated

  • forcefield (str) – Specify ‘uff’ or ‘mmff’ to apply forcefield based geometry optimization on new atoms.

  • return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule

Returns:

A molecule with explicit hydrogens added

Return type:

Molecule or rdkit.Chem.Mol

apply_reaction_smarts(mol, reaction_smarts, complete=False, forcefield=None, return_rdmol=False)[source]

Applies reaction smirks and returns product. If returned as a PLAMS molecule, thismolecule.properties.orig_atoms is a list of indices of atoms that have not been changed (which can for example be used partially optimize new atoms only with the freeze keyword)

Parameters:
  • mol (Molecule or rdkit.Chem.Mol) – molecule to be modified

  • reactions_smarts (str) – Reactions smarts to be applied to molecule

  • complete (bool or float (value between 0 and 1)) – Apply reaction until no further changes occur or given fraction of reaction centers have been modified

  • forcefield (str) – Specify ‘uff’ or ‘mmff’ to apply forcefield based geometry optimization of product structures.

  • return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule

Returns:

(product molecule, list of unchanged atoms)

Return type:

(Molecule, list of int)

apply_template(mol, template)[source]

Modifies bond orders in PLAMS molecule according template smiles structure.

Parameters:
  • mol (Molecule or rdkit.Chem.Mol) – molecule to be modified

  • template (str) – smiles string defining the correct chemical structure

Returns:

Molecule with correct chemical structure and provided 3D coordinates

Return type:

Molecule

get_backbone_atoms(mol)[source]

Return a list of atom indices corresponding to the backbone atoms in a peptide molecule. This function assumes PDB information in properties.pdb_info of each atom, which is the case if the molecule is generated with the “readpdb” or “from_sequence” functions.

Parameters:

mol (Molecule or rdkit.Chem.Mol) – a peptide molecule

Returns:

a list of atom indices

Return type:

list

modify_atom(mol, idx, element)[source]

Change atom “idx” in molecule “mol” to “element” and add or remove hydrogens accordingly

Parameters:
  • mol (Molecule or rdkit.Chem.Mol) – molecule to be modified

  • idx (int) – index of the atom to be modified

  • element (str) –

Returns:

Molecule with new element and possibly added or removed hydrogens

Return type:

Molecule

to_rdmol(plams_mol, sanitize=True, properties=True, assignChirality=False)[source]

Translate a PLAMS molecule into an RDKit molecule type. PLAMS Molecule, Atom or Bond properties are pickled if they are neither booleans, floats, integers, floats nor strings, the resulting property names are appended with ‘_pickled’.

Parameters:
  • plams_mol (Molecule) – A PLAMS molecule

  • sanitize (bool) – Kekulize, check valencies, set aromaticity, conjugation and hybridization

  • properties (bool) – If all Molecule, Atom and Bond properties should be converted from PLAMS to RDKit format.

  • assignChirality (bool) – Assign R/S and cis/trans information, insofar as this was not yet present in the PLAMS molecule.

Returns:

an RDKit molecule

Return type:

rdkit.Chem.Mol

from_rdmol(rdkit_mol, confid=-1, properties=True)[source]

Translate an RDKit molecule into a PLAMS molecule type. RDKit properties will be unpickled if their name ends with ‘_pickled’.

Parameters:
  • rdkit_mol (rdkit.Chem.Mol) – RDKit molecule

  • confid (int) – conformer identifier from which to take coordinates

  • properties (bool) – If all Chem.Mol, Chem.Atom and Chem.Bond properties should be converted from RDKit to PLAMS format.

Returns:

a PLAMS molecule

Return type:

Molecule

from_sequence(sequence, nconfs=1, name=None, forcefield=None, rms=0.1)[source]

Generates PLAMS molecule from a peptide sequence. Includes explicit hydrogens and 3D coordinates.

Parameters:
  • sequence (str) – A peptide sequence, e.g. ‘HAG’

  • nconfs (int) – Number of conformers to be generated

  • name (str) – A name for the molecule

  • forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.

  • rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.

Returns:

A peptide molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1

Return type:

Molecule or list of PLAMS Molecules

from_smiles(smiles, nconfs=1, name=None, forcefield=None, rms=0.1)[source]

Generates PLAMS molecule(s) from a smiles strings.

Parameters:
  • smiles (str) – A smiles string

  • nconfs (int) – Number of conformers to be generated

  • name (str) – A name for the molecule

  • forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of conformations. The default value None results in skipping of the geometry optimization step.

  • rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations

Returns:

A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1

Return type:

Molecule or list of PLAMS Molecules

from_smarts(smarts, nconfs=1, name=None, forcefield=None, rms=0.1)[source]

Generates PLAMS molecule(s) from a smarts strings. This allows for example to define hydrogens explicitly. However it is less suitable for aromatic molecules (use from_smiles in that case).

Parameters:
  • smarts (str) – A smarts string

  • nconfs (int) – Number of conformers to be generated

  • name (str) – A name for the molecule

  • forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.

  • rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.

Returns:

A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1

Return type:

Molecule or list of PLAMS Molecules

to_smiles(plams_mol, short_smiles=True, **kwargs)[source]

Returns the RDKit-generated SMILES string of a PLAMS molecule.

Note: SMILES strings are generated based on the molecule’s connectivity. If the input PLAMS molecule does not contain any bonds, “guessed bonds” will be used.

Parameters:
  • plams_mol – A PLAMS Molecule

  • short_smiles (bool) – whether or not to use some RDKit sanitization to get shorter smiles (e.g. for a water molecule, short_smiles=True -> “O”, short_smiles=False -> [H]O[H])

  • **kwargs – With ‘kwargs’ you can provide extra optional parameters to the rdkit.Chem method ‘MolToSmiles’. See the rdkit documentation for more info.

Returns:

the SMILES string

partition_protein(mol, residue_bonds=None, split_heteroatoms=True, return_rdmol=False)[source]

Splits a protein molecule into capped amino acid fragments and caps.

Parameters:
  • mol (Molecule or rdkit.Chem.Mol) – A protein molecule

  • residue_bonds (tuple) – a tuple of pairs of residue number indicating which peptide bonds to split. If none, split all peptide bonds.

  • split_heteroatoms (bool) – if True, all bonds between a heteroatom and a non-heteroatom across residues are removed

Returns:

list of fragments, list of caps

readpdb(pdb_file, sanitize=True, removeHs=False, proximityBonding=False, return_rdmol=False)[source]

Generate a molecule from a PDB file

Parameters:
  • pdb_file (path- or file-like) – The PDB file to read

  • sanitize (bool) –

  • removeHs (bool) – Hydrogens are removed if True

  • return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule

Returns:

The molecule

Return type:

Molecule or rdkit.Chem.Mol

writepdb(mol, pdb_file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Write a PDB file from a molecule

Parameters:
  • mol (Molecule or rdkit.Chem.Mol) – molecule to be exported to PDB

  • pdb_file (path- or file-like) – The PDB file to write to, or a filename

get_substructure(mol, func_list)[source]

Search for functional groups within a molecule based on a list of reference functional groups. SMILES strings, PLAMS and/or RDKit molecules can be used interchangeably in “func_list”.

Example:

>>> mol = from_smiles('OCCO')  # Ethylene glycol
>>> func_list = ['[H]O', 'C[N+]', 'O=PO']
>>> get_substructure(mol, func_list)

{'[H]O': [(<scm.plams.mol.atom.Atom at 0x125183518>,
           <scm.plams.mol.atom.Atom at 0x1251836a0>),
          (<scm.plams.mol.atom.Atom at 0x125183550>,
           <scm.plams.mol.atom.Atom at 0x125183240>)]}
Parameters:
  • mol (Molecule) – A PLAMS molecule.

  • func_list (list) – A list of functional groups. Functional groups can be represented by SMILES strings, PLAMS and/or RDKit molecules.

Returns:

A dictionary with functional groups from “func_list” as keys and a list of n-tuples with matching PLAMS Atom as values.

get_conformations(mol, nconfs=1, name=None, forcefield=None, rms=-1, enforceChirality=False, useExpTorsionAnglePrefs='default', constraint_ats=None, EmbedParameters='EmbedParameters', randomSeed=1, best_rms=-1)[source]

Generates 3D conformation(s) for an rdkit_mol or a PLAMS Molecule

Parameters:
  • mol (rdkit.Chem.Mol or Molecule) – RDKit or PLAMS Molecule

  • nconfs (int) – Number of conformers to be generated

  • name (str) – A name for the molecule

  • forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step

  • rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.

  • best_rms (float) – Root Mean Square deviation of best atomic permutation for removing similar/equivalent conformations.

  • enforceChirality (bool) – Enforce the correct chirality if chiral centers are present

  • useExpTorsionAnglePrefs (str) – Use experimental torsion angles preferences for the conformer generation by rdkit

  • constraint_ats (list) – List of atom indices to be constrained

  • EmbedParameters (str) – Name of RDKit EmbedParameters class (‘EmbedParameters’, ‘ETKDG’)

  • randomSeed (int) – The seed for the random number generator. If set to None the generated conformers will be non-deterministic.

Returns:

A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1

Return type:

Molecule or list of PLAMS Molecules

yield_coords(rdmol, id=-1)[source]

Take an rdkit molecule and yield its coordinates as 3-tuples.

>>> from scm.plams import yield_coords
>>> from rdkit import Chem

>>> rdmol = Chem.Mol(...)  # e.g. Methane
>>> for xyz in yield_coords(rdmol):
...     print(xyz)
(-0.0, -0.0, -0.0)
(0.6405, 0.6405, -0.6405)
(0.6405, -0.6405, 0.6405)
(-0.6405, 0.6405, 0.6405)
(-0.6405, -0.6405, -0.6405)

The iterator produced by this function can, for example, be passed to Molecule.from_array() the update the coordinates of a PLAMS Molecule in-place.

>>> from scm.plams import Molecule

>>> mol = Molecule(...)

>>> xyz_iterator = yield_coords(rdmol)
>>> mol.from_array(xyz_iterator)
Parameters:
  • rdmol (rdkit.Chem.Mol) – An RDKit mol.

  • id (int) – The ID of the desired conformer.

Returns:

An iterator yielding 3-tuples with rdmol’s Cartesian coordinates.

Return type:

iterator

canonicalize_mol(mol, inplace=False, **kwargs)[source]

Take a PLAMS molecule and sort its atoms based on their canonical rank.

Example:

>>> from scm.plams import Molecule, canonicalize_mol

# Methane
>>> mol: Molecule = ...
>>> print(mol)
Atoms:
    1         H      0.640510      0.640510     -0.640510
    2         H      0.640510     -0.640510      0.640510
    3         C      0.000000      0.000000      0.000000
    4         H     -0.640510      0.640510      0.640510
    5         H     -0.640510     -0.640510     -0.640510

>>> print(canonicalize_mol(mol))
Atoms:
    1         C      0.000000      0.000000      0.000000
    2         H     -0.640510     -0.640510     -0.640510
    3         H     -0.640510      0.640510      0.640510
    4         H      0.640510     -0.640510      0.640510
    5         H      0.640510      0.640510     -0.640510
Parameters:
  • mol (Molecule) – The to-be canonicalized molecule.

  • inplace (bool) – Whether to sort the atoms inplace or to return a new molecule.

  • **kwargs – Further keyword arguments for rdkit.Chem.CanonicalRankAtoms.

Returns:

Either None or a newly sorted molecule, depending on the value of inplace.

Return type:

None or Molecule