Overview¶
General description¶
Starting from the adf2005.01 version the utility pdb2adf is available in the official release. Previously this utility could be found on the contributed software page. Starting from adf2008.01 there is support for the NEWQMMM subkey if the environment variable SCM_PDB2ADF is set to NEW.
The pdb2adf utility was written to read a PDB file, which contains the atomic coordinates of a protein structure, and transform it into an ADF inputfile, particularly for use with QM/MM calculations. Starting from the current release it can also be used for setting up a solvent shell around a solute molecule.
The PDB files are generally used for protein structures, and are formatted according to certain rules, see: http://www.wwpdb.org/docs.html, and the part about the official PDB format below.
For every residue/molecule present in the PDB file, there should be a fragment file available, either in the general ADF library ($ADFRESOURCES/pdb2adf directory), or in the local directory where the pdb2adf program is being called. Fragment files in the local directory take higher priority than those in the general ADF library. The fragment files are formatted, based loosely on AMBER parameter files, and contain information about the residues; e.g., the atoms present, with their general and forcefield atomnames, atomic charges, connections to other atoms for creating their positions when not found on the PDB file, etc.; see part about fragment files below. Available in the ADF library are fragment files for amino acid residues, including those at the N- or C-terminal residue, three solvents (water, methanol, chloroform), some ions that are present frequently in protein structures (copper, fluoride), etc.
Also present in the ADF library are solvent box files that can be used to place a layer of solvents surrounding the protein, or a solute. Available are the three solvents mentioned above.
After reading the PDB and corresponding fragment files, the program tries to figure out which atoms are missing, and will add those; it uses the information provided on the fragment files to do so. For certain amino acid residues, there are several protonation states possible, e.g. histidine can be protonated at the N-delta position, at the N-epsilon position, or on both. The default option is to choose the fully charged option for aspartate (Asp), glutamate (Glu), lysine (Lys) residues, and decide for each histidine (His) and cysteine (Cys) residue individually what the protonation state should be. In those individual cases, the distances of neighboring molecules/residues are given that may help determine the protonation state. See the protein example below.
After all that is setup properly, a list is given with residue names/numbers, from which you can choose those that should be placed in the QM system; afterwards, for each of the selected QM residues, a choice should be made where to cut-off the QM part. The most appropriate point to cut-off seems to be at the C-alpha position, except when dealing with a proline (Pro). The latter residue is cyclic, e.g. the sidechain is connected to the C-alpha carbon ! For that residue, it may be better to include the C-alpha, H-alpha, and backbone carbonyl group of the preceding residue in the QM part.
The program will try to use to replace the ”.pdb” extension of the PDB file by ”.pdb2adf” for the ADF inputfile to be made; for convenience, the program also writes out an ”.p2a.pdb” file with the complete system as it being made by the program. This file can then be visualized by conventional viewer programs (such as iMol, VMD, Molekel, ADFview) for visual inspection if everything has been carried out correctly.
Given below are two examples, one for the application of a protein, the other how to set up a solvent shell run.
Things to notice¶
- The current QM/MM implementation in ADF is limited to a total of 1000 QMMM atoms; currently, a new implementation is underway that is more flexible, and that doesn’t have this limit. This new implementation is available with the NEWQMMM subkey, work in progress.
- The NEWQMMM format is used if the environment variable SCM_PDB2ADF is set to NEW.
- The pdb2adf program uses AMBER parameter files, and is setup to work with the AMBER force field, version AMBER95, which is designed for and works well for biosystems.
- For questions, remarks, contact: support@scm.com.
Official PDB format¶
Columns | Data Type | Field | Definition |
1 - 6 | Record name | ‘ATOM’ or ‘HETATM’ | |
7 - 11 | Integer | serial | Atom serial number. |
13 - 16 | Atom | name | Atom name. |
17 | Character | altLoc | Alternate location indicator. |
18 - 20 | Residue name | resName | Residue name. |
22 | Character | chainID | Chain identifier. |
23 - 26 | Integer | resSeq | Residue sequence number. |
27 | AChar | iCode | borderleft for insertion of residues. |
31 - 38 | Real(8.3) | x | Orthogonal coordinates for X in Angstroms. |
39 - 46 | Real(8.3) | y | Orthogonal coordinates for Y in Angstroms. |
47 - 54 | Real(8.3) | z | Orthogonal coordinates for Z in Angstroms. |
55 - 60 | Real(6.2) | occupancy | Occupancy. |
61 - 66 | Real(6.2) | tempFactor | Temperature factor. |
73 - 76 | LString(4) | segID | Segment identifier, left-justified. |
77 - 78 | LString(2) | element | Element symbol, right-justified. |
79 - 80 | LString(2) | charge | Charge on the atom. |
Typical examples from PDB-files:
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM 76 O GLY A9 6.671 55.354 35.873 1.00 14.75 A
ATOM 77 N ASN A10 6.876 53.257 36.629 1.00 16.09 A
ATOM 62 O GLY A 9 6.791 55.214 35.719 1.00 15.61 4AZU 153
ATOM 63 N ASN A 10 6.892 53.135 36.555 1.00 12.64 4AZU 154
The pdb2adf utility is flexible, and should be able to read most PDB files, even those with incomplete or erroneous line formats. From every ATOM/HETATM line, it tries to read:
- atom number
- atom name
- residuename
- chain identifier
- residue number
- X,Y,Z coordinates
Hints for proper formatting:
- always group together atoms that belong to one residue
- always give the atom name on columns 13-16
- when specifying a chain-id use only letters (or a blank)
Contents of fragment file¶
Given below is the contents of the fragment file for water. The first line is a comment line, the only important parameter is the NOCONNECT keyword, which indicates that the program should not try to make any connections to other residues/molecules. Then follow three lines, that define the orientation in space of the residue; they are not used for general fragments, but are relevant and important for amino acid residues and DNA nucleotides. Finally, for each atom in the molecule, there should be a line with its number in the fragment; its name to be used in PDB files; the AMBER forcefield atomtype; a dummy atomname; connections and coordinates (bond, angle, dihedral angle) to other atoms in the molecule that can be used to give the position of the atom if it is not present in the PDB file; the atomic charge; and after the exclamation mark (!) the connections to other atoms in this fragment, or other fragments in case of amino acid residues/DNA nucleotides. The current version does not use the latter connections yet, but the next version will probably use them.
HOH Water molecule NOCONNECT
1 DUMM DU M 0 0 0 0.0000 0.0000 0.0000
2 DUMM DU M 1 0 0 1.4490 0.0000 0.0000
3 DUMM DU M 2 1 0 1.5220 111.1000 0.0000
4 O OW O 0 0 0 0.0000 0.0000 0.0000 -0.8340 ! 5 6
5 H1 HW H 4 0 0 0.9572 0.0000 0.0000 0.4170 ! 4
6 H2 HW H 4 5 0 0.9572 104.5200 0.0000 0.4170 ! 4
Contents of solvent box files¶
The first line is a comment line, followed by a line with the total number of atoms in the solvent box and the dimensions of the box (in Angstroms); then for each atom in the box, the atom name, which must match the PDB atomname, and the Cartesian coordinates, again in Angstroms.