FastSigma: a QSPR method to estimate COSMO sigma-profiles¶
Introduction¶
The traditional workflow for performing a COSMO-RS/-SAC calculation first involves a very expensive DFT geometry optimization and single point calculation in the COSMO phase to generate \(\sigma\)-profiles and other necessary parameters for COSMO-RS/-SAC. Once these parameters are known, the COSMO-RS/-SAC calculation can be performed extremely efficiently, often in only a matter of milliseconds. This imbalance of computational expense means there is a significant opportunity in circumventing the expensive DFT steps.
The FastSigma
program reads a molecule in several possible formats (SMILES, .mol, .sdf) and estimates all of the properties required for a COSMO-RS/-SAC calculation: the HB-/Non-HB-/OT-/OH- \(\sigma\)-profiles, COSMO surface area, and COSMO volume as well as bond energies that can be important for vapor phase or multispecies calculations. This code incorporates two distinct methods that are able to estimate these important COSMO-RS/-SAC properties. The first uses QSPR techniques similar to those applied in our Property Prediction program which shares the same accepted atom types. The second uses a database of \(\sigma\)-profiles and a custom molecular graph hashing algorithm to build \(\sigma\)-profiles for query molecules using a set of the \(\sigma\)-profiles from molecules in the database containing similar substructures.
Both of these techniques are extremely efficient and are capable of providing estimates for these essential COSMO-RS/-SAC properties in milliseconds. This allows for quick thermodynamic calculations to be done for a new molecule of interest as well as drastically expedites searches through screening databases of molecular candidates as compared to the traditional, full-fledged COSMO-RS/-SAC workflow.
Important
To use the SG1
method, users will have to first download the Subgraph Sigma Profile Estimation (SG1) Database (molsg_sg1db) using the AMS Package Manager.
Note
pyCRS can be used for python scripting with FastSigma. Several python examples are given in the pyCRS documentation.
Input options¶
A list of the input options and examples of their usage is given below.
Flag |
Purpose |
Example |
---|---|---|
-h [–help] |
Produces help message |
$AMSBIN/fast_sigma –help |
-s [–smiles] |
Input molecule as SMILES sting |
$AMSBIN/fast_sigma –smiles <SMILES> … |
-m [–mol] |
Input molecule as .mol file |
$AMSBIN/fast_sigma –mol <mol file> … |
–sdf |
Input molecule as an .sdf file |
$AMSBIN/fast_sigma –sdf <file.sdf> … |
–model |
Choose from 2 possible techniques |
$AMSBIN/fast_sigma –model FS1 … |
-d [–display] |
Display problem results |
$AMSBIN/fast_sigma -d … |
-o [–output] |
Write output to file |
$AMSBIN/fast_sigma –o <output.compkf> … |
–method |
Chose a COSMO-RS/-SAC method |
$AMSBIN/fast_sigma –method COSMO-RS … |
--model FS1
The
FS1
model is a QSPR model. It currently has two supported methods: COSMO-RS and COSMOSAC2016. One of these method names must be entered after the –method flag. The default method is COSMO-RS.--model SG1
The
SG1
model is based on substructure hashing and database searching. It currently has two supported methods: COSMO-RS and COSMOSAC2013.Note
This model can take a few seconds to load the required database. If the user would like to use this method to estimate multiple compounds, it is recommended to use pyCRS. With pyCRS, the database will only be loaded during the first calculation and then stay in memory.
-o <output.compkf>
The fast sigma program writes the output results to a file in .compkf format. The chosen output filename should generally end with .compkf. This suffix helps other parts of the code (COSMO-RS/-SAC/-UNIFAC/Solvent Optimization) recognize the format and use the file accordingly. If no filename is supplied the program writes to a file called CRSKF.compkf.
-s <SMILES_string or .mol file>
Though COSMO-RS/-SAC can make estimates for many types of molecular species, the fast sigma program currently only supports organic, neutral, closed shell molecules.
GUI Input¶
The simplest way to use the Fast Sigma program is through the COSMO-RS GUI. There are two ways to do this:
SMILES string: Compounds → List of Compounds → Add Compound using FastSigma → SMILES and select Add.
.xyz file: Compounds → List of Compounds → Add Compound using FastSigma → .xyz, and select Add.
A .compkf file will be saved that can be used as input in COSMO-RS calculations.
Examples¶
This example calculates COSMO-RS (the default) parameters for phenol:
$AMSBIN/fast_sigma --smiles "c1ccccc1(O)" -d
[show/hide output]
sigma value Total profile HB profile
-0.025 0.000 0.000
-0.024 0.000 0.000
-0.023 0.000 0.000
-0.022 0.002 0.002
-0.021 0.054 0.054
-0.020 0.263 0.263
-0.019 0.523 0.523
-0.018 0.684 0.684
-0.017 0.828 0.828
-0.016 0.801 0.801
-0.015 0.732 0.716
-0.014 0.642 0.597
-0.013 0.653 0.519
-0.012 0.678 0.487
-0.011 0.607 0.423
-0.010 0.567 0.382
-0.009 0.646 0.245
-0.008 4.183 0.023
-0.007 7.405 0.000
-0.006 7.912 0.000
-0.005 6.701 0.000
-0.004 5.544 0.000
-0.003 4.658 0.000
-0.002 3.899 0.000
-0.001 4.097 0.000
0.000 6.109 0.000
0.001 7.854 0.000
0.002 8.640 0.000
0.003 9.726 0.000
0.004 11.175 0.000
0.005 12.524 0.000
0.006 8.673 0.000
0.007 2.255 0.000
0.008 1.174 0.161
0.009 1.279 1.159
0.010 1.442 1.442
0.011 1.759 1.751
0.012 1.795 1.788
0.013 0.838 0.829
0.014 0.095 0.093
0.015 0.054 0.054
0.016 0.030 0.030
0.017 0.000 0.000
0.018 0.000 0.000
0.019 0.000 0.000
0.020 0.000 0.000
0.021 0.000 0.000
0.022 0.000 0.000
0.023 0.000 0.000
0.024 0.000 0.000
0.025 0.000 0.000
Molecular Mass = 94.0418648120 g/mol
COSMO Area = 127.5012207186 Angstrom**2
COSMO Volume = 122.0791950835 Angstrom**3
Gas Phase Bond Energy = -2.9875007647 Hartree
Bond Energy = -2.9968155744 Hartree
Dispersion = -4.5319123638 kcal/mol
Deltaediel = 0.0000000000 Hartree
Nring = 6
Chemical Formula = C6H6O
SMILES = c1ccccc1(O)
Additionally, we calculate the COSMOSAC2016 parameters for Ibuprofen as a mol file:
$AMSBIN/fast_sigma --mol Ibuprofen.mol --method COSMOSAC2016 -d
[show/hide output]
sigma value Total profile OH profile OT profile
-0.025 0.000 0.000 0.000
-0.024 0.000 0.000 0.000
-0.023 0.000 0.000 0.000
-0.022 0.000 0.000 0.000
-0.021 0.009 0.009 0.000
-0.020 0.062 0.061 0.000
-0.019 0.395 0.385 0.000
-0.018 0.914 0.881 0.000
-0.017 0.925 0.879 0.000
-0.016 0.840 0.781 0.000
-0.015 0.652 0.590 0.000
-0.014 0.697 0.606 0.000
-0.013 0.604 0.499 0.000
-0.012 0.561 0.398 0.000
-0.011 0.725 0.418 0.000
-0.010 0.833 0.350 0.000
-0.009 1.282 0.230 0.000
-0.008 2.141 0.158 0.000
-0.007 5.133 0.085 0.000
-0.006 10.428 0.048 0.000
-0.005 14.386 0.000 0.000
-0.004 23.816 0.000 0.000
-0.003 26.081 0.000 0.000
-0.002 23.295 0.000 0.000
-0.001 21.443 0.000 0.000
0.000 22.124 0.000 0.000
0.001 20.652 0.000 0.000
0.002 24.315 0.036 0.000
0.003 15.722 0.086 0.035
0.004 11.878 0.171 0.092
0.005 13.670 0.288 0.197
0.006 10.405 0.381 0.307
0.007 5.479 0.561 0.413
0.008 3.525 0.713 0.613
0.009 3.358 0.823 1.055
0.010 3.879 0.639 1.840
0.011 4.503 0.180 3.025
0.012 2.708 0.083 2.006
0.013 0.930 0.020 0.745
0.014 0.061 0.000 0.104
0.015 0.000 0.000 0.000
0.016 0.000 0.000 0.000
0.017 0.000 0.000 0.000
0.018 0.000 0.000 0.000
0.019 0.000 0.000 0.000
0.020 0.000 0.000 0.000
0.021 0.000 0.000 0.000
0.022 0.000 0.000 0.000
0.023 0.000 0.000 0.000
0.024 0.000 0.000 0.000
0.025 0.000 0.000 0.000
Molecular Mass = 206.1306798160 g/mol
COSMO Area = 278.4276940312 Angstrom**2
COSMO Volume = 279.3341044098 Angstrom**3
Gas Phase Bond Energy = -7.1463537624 Hartree
Bond Energy = -7.1619486814 Hartree
Dispersion = -9.7153055452 kcal/mol
Deltaediel = 0.0007518662 Hartree
Nring = 6
Chemical Formula = C13H18O2
SMILES = CC(C)Cc1ccc(C(C)C(=O)O)cc1
We can also use the SG1
model for phenol.
$AMSBIN/fast_sigma --smiles "c1ccccc1(O)" --model SG1 -d
[show/hide output]
sigma value Total profile HB profile
-0.025 0.000 0.000
-0.024 0.000 0.000
-0.023 0.000 0.000
-0.022 0.003 0.003
-0.021 0.067 0.067
-0.020 0.434 0.434
-0.019 0.878 0.878
-0.018 0.995 0.995
-0.017 0.996 0.996
-0.016 0.942 0.940
-0.015 0.771 0.766
-0.014 0.684 0.635
-0.013 0.610 0.549
-0.012 0.693 0.486
-0.011 0.671 0.397
-0.010 0.755 0.350
-0.009 1.344 0.255
-0.008 4.312 0.026
-0.007 7.751 0.000
-0.006 7.855 0.000
-0.005 6.819 0.000
-0.004 6.226 0.000
-0.003 5.612 0.000
-0.002 4.654 0.000
-0.001 4.679 0.000
0.000 4.969 0.000
0.001 5.814 0.000
0.002 7.672 0.000
0.003 10.711 0.000
0.004 12.231 0.000
0.005 12.061 0.000
0.006 8.394 0.000
0.007 3.355 0.000
0.008 1.677 0.153
0.009 1.434 1.226
0.010 1.566 1.566
0.011 1.972 1.972
0.012 2.133 2.133
0.013 0.966 0.966
0.014 0.062 0.062
0.015 0.000 0.000
0.016 0.000 0.000
0.017 0.000 0.000
0.018 0.000 0.000
0.019 0.000 0.000
0.020 0.000 0.000
0.021 0.000 0.000
0.022 0.000 0.000
0.023 0.000 0.000
0.024 0.000 0.000
0.025 0.000 0.000
Molecular Mass = 94.0418648120 g/mol
COSMO Area = 133.1606910587 Angstrom**2
COSMO Volume = 122.0268006780 Angstrom**3
Gas Phase Bond Energy = -2.9830476046 Hartree
Bond Energy = -2.9928087890 Hartree
Dispersion = 0.0000000000 kcal/mol
Deltaediel = 0.0000000000 Hartree
Nring = 6
Chemical Formula = C6H6O
SMILES = c1ccccc1(O)
The warning message will be displayed if a molecule contains atoms or substructures that are not listed in the accepted atom types table. For example, in the compound C1=CC=[Ge]C=C1, the atom ‘Ge’ is not available in the QSPR method. As a result, the property prediction tool will yield incorret sigma profile.
$AMSBIN/fast_sigma --smiles "C1=CC=[Ge]C=C1" -d
[show/hide output]
WARNING: there are atoms and/or substructures in the molecule which cannot be estimated.
This will affect the accuracy of the results.
Atoms which cannot be estimated:
Ge
sigma value Total profile HB profile
-0.025 0.000 0.000
-0.024 0.000 0.000
-0.023 0.000 0.000
-0.022 0.000 0.000
-0.021 0.000 0.000
-0.020 0.000 0.000
-0.019 0.000 0.000
-0.018 0.000 0.000
-0.017 0.000 0.000
-0.016 0.000 0.000
-0.015 0.000 0.000
-0.014 0.000 0.000
-0.013 0.000 0.000
-0.012 0.000 0.000
-0.011 0.000 0.000
-0.010 0.000 0.000
-0.009 0.000 0.000
-0.008 0.896 0.000
-0.007 2.280 0.000
-0.006 5.170 0.000
-0.005 9.078 0.000
-0.004 9.044 0.000
-0.003 4.854 0.000
-0.002 4.211 0.000
-0.001 4.505 0.000
-0.000 4.415 0.000
0.001 4.824 0.000
0.002 4.750 0.000
0.003 5.745 0.000
0.004 3.006 0.000
0.005 4.904 0.000
0.006 5.411 0.000
0.007 4.222 0.000
0.008 2.623 0.000
0.009 0.000 0.000
0.010 0.000 0.000
0.011 0.000 0.000
0.012 0.000 0.000
0.013 0.000 0.000
0.014 0.000 0.000
0.015 0.000 0.000
0.016 0.000 0.000
0.017 0.000 0.000
0.018 0.000 0.000
0.019 0.000 0.000
0.020 0.000 0.000
0.021 0.000 0.000
0.022 0.000 0.000
0.023 0.000 0.000
0.024 0.000 0.000
0.025 0.000 0.000
Molecular Mass = 138.9603029600 g/mol
COSMO Area = 79.9378826526 Angstrom**2
COSMO Volume = 88.0443110798 Angstrom**3
Gas Phase Bond Energy = -2.2538026599 Hartree
Bond Energy = -2.2571102789 Hartree
Dispersion = -2.9625031363 kcal/mol
Deltaediel = 0.0000000000 Hartree
Nring = 6
Chemical Formula = C5H5Ge
SMILES = C1=CC=[Ge]C=C1