Database¶
The submodule contain several class for providing an interface to a sql database for managing COSKF files and physical properties.
-
class
pyCRS.Database.
COSKFDatabase
(path)¶ A class provide an interface to a sql database containing the following tables.
Table name
Description
Compound
contains unique compounds along with their COSKF file based on either CAS number or any prefered identifier
Conformer
contains mutiple conformers along with their COSKF file
PhysicalProperty
contains the physical properties input by user
PropPred
contains the estimated physical properties using QSPR methods based on SMILES
- Parameters
path (
str
) – a path to the database file. If this file does not exist, it will be created.
-
add_compound
(coskf_file, name=None, cas=None, identifier=None, coskf_path=None, smiles=None, nring=None, ignore_smiles_check=False, ignore_duplicates=False)¶ Adds a new
.coskf
file to the database.- Parameters
coskf_file (
str
) – a path to the.coskf
file, or alternatively, the file name of the.coskf
file if thecoskf_path
is provided.- Keyword Arguments
name (
str
) – The entry’s name, such as the compound name. If not provided, it will prioritize using the IUPAC name, CAS number, identifier, or the name of the .coskf file if such value is provided through the add_compound() method or stored in the ‘Compound Data’ section in the .coskf file.cas (
str
) – The CAS number of the molecule. If not provided, it will attempt to use the CAS within the.coskf
file if available.identifier (
str
) – The chemical identifier of the molecule.coskf_path (
str
) – The directory path to the coskf file. If not provided, it will attempt to locate the path of ADFCRS-2018 database.smiles (
str
) – The SMILES string of the molecule. If not provided, it will attempt to use the SMILES within the.coskf
file if availablenring (
int
) – The numbr of ring atoms. If not provided, it will attempt to use the Nring within the.coskf
file if availableignore_smiles_check (
bool
) – If set to True, skip generating the SMILES from compound’s coordinates to confirm its identity against the database. Default is False.ignore_duplicates (
bool
) – If set to True, skip duplicate recognition using UniqueConformersCrest in AMSConformer tool. Default is False.
Note
Ensure every compound has a unique representation, either by CAS number or a preferred identifier. During the add_compound process, both the CAS number and identifier are checked for uniqueness in the database. If multiple compounds share the same CAS number and identifier, an ERROR will be raised. For instance, the below operation is not allowed since both compound shared the same identifier=’CRS0001’
db.add_compound("Benzene.coskf",cas="71-43-2",identifier="CRS0001") db.add_compound("Ethanol.coskf",cas="64-17-5",identifier="CRS0001")
-
add_physical_property
(identifier, attribute, value, unit=None)¶ Add a value of a physical property to the PhysicalProperty TABLE in the database by compound’s identifier
- Parameters
identifier (
str
) – the string representing either CAS, identifier or name of a compoundattribute (
str
) – the name of the physical property (eg. meltingpoint or hfusion)value (
float
) – the value of the physical property
- Keyword Arguments
unit (
str
) – (optional) the unit of the input value. The default unit is K, kcal/mol and kcal/mol-K. The accepted unit now has K, C, kcal/mol, kJ/mol, cal/g, J/g, kcal/mol-K, kJ/mol-K, cal/g-K, J/g-K
Example
db.add_physical_property(‘Benzene’,’meltingpoint’,278.7) db.add_physical_property(‘Benzene’,’hfusion’,9.91,unit=’kJ/mol’) db.add_physical_property(‘Benzene’,’vp_equation’,’Antoine’) db.add_physical_property(‘Benzene’,’vp_params’,’4.72583, 1660.652, -1.461’)
-
clear_physical_property
(identifier: str, attribute: Optional[str] = None)¶ Clear the value of a physical property in PhysicalProperty TABLE in the database by compound’s identifier
- Parameters
identifier (
str
) – the string representing either CAS, identifier or name of a compound- Keyword Arguments
attribute (
str, optional
) – The name of the physical property to clear. If not provided, all physical properties will be cleared.
-
del_row
(dbrow: pyCRS.Database.CompoundRow.CompoundRow)¶ Remove a compound from the database and delete the corresponding
.coskf
file.- Parameters
dbrow (
CompoundRow
) – the row to remove from the database
-
del_row_by_conformer_id
(conformer_id)¶ Remove the conformer from the database.
- Parameters
conformer_id (
int
) – A integer of intergers representing the conformer in the CONFORMER TABLE.
Example
db.del_row_by_conformer_id(1)
-
del_rows
(dbrows)¶ Remove multiple compounds from the database and delete the corresponding
.coskf
files.- Parameters
dbrows (
list
) – the rows to remove from the database, represented as a list of CompoundRow objects
Example
db.del_rows(db.get_compounds(‘benzene’))
-
estimate_physical_property
(identifier=None, compound_id=None)¶ Estimate the physical properties using the property prediction tool and add the values to the PropPred TABLE in the database
- Keyword Arguments
identifier (
str or list
) – a string or a list of string representing either CAS, identifier or name of a compoundcompound_id (
int or list
) – an integer or a list representing the compound ID(s).
Note
The QSPR descriptor used in the property prediction tool is determined from the SMILES string. It first attempts to use the SMILES string provided by user via the add_compound method or modify_attribute_by_compound_id method. If unavailable, it will used the SMILES generating by OpenBabel using the compound’s coordinates in the COSKF file. Please note that the resolved SMILES may be incorrect for some molecules, for instance when bond orders cannot be automatically determined and species with charges.
Example :
db.estimate_physical_property("Benzene")
-
get_all_compounds
()¶ Collect all compounds in the database
- Returns
The full list of CompoundRow instances in the database
- Return type
list of CompoundRow
-
get_all_conformers
()¶ Collects all conformers in the database
- Returns
The full list of ConformerRow instances in the database.
- Return type
List of ConformerRow
-
get_all_physical_properties
(source='PhysicalProperty')¶ Collect all physical properties in the database
- Keyword Arguments
source (
str
) – The string should be either ‘PhysicalProperty’ or ‘PropPred’. Defaults to ‘PhysicalProperty’, returning properties from the PhysicalProperty TABLE. If set to ‘PropPred’, it will return the estimated properties in PropPred TABLE.- Returns
The full list of PhysicalPropertyRow instances or PropPredRow instances in the database
-
get_attribute_by_compound_id
(attributes, compound_id)¶ Retrieve the list of values for compounds with specified compound_id(s)
- Parameters
attributes (
str or list
) – A string or a list of strings used for searching for in the COMPOUND TABLE.compound_id (
int or list
) – A integer or a list of intergers used to search for compounds in the COMPOUND TABLE.
- Returns
A list of tuples containing the values of the specified attributes for the compounds.
- Return type
list of attributes
-
get_compounds
(values)¶ Retrieves compounds from the COMPOUND TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters
values (
str or list
) – A string or a list of strings used for searching, representing CAS numbers, chemical identifiers, or names.- Returns
A list of CompoundRow instances that match the search criteria
- Return type
list of CompoundRow
-
get_compounds_id
(values)¶ Retrieves compound id from the COMPOUND TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters
values (
str or list
) – A string or a list of strings used for searching, representing CAS numbers, chemical identifiers, or names.- Returns
A list of compound IDs that match the search criteria.
- Return type
list of int
-
get_conformers
(values)¶ Retrieves conformers from the CONFORMER TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters
values (
str or list
) – A string or a list of strings used for searching, representing CAS numbers, chemical identifiers, or names.- Returns
A list of ConformerRow instances that match the search criteria.
- Return type
list of ConformerRow
-
get_physical_properties
(identifier=None, compound_id=None, source='PhysicalProperty')¶ Collect physical properties in the database by matching CAS number, chemical identifier, name or compound id.
- Keyword Arguments
identifier (
str or list
) – a string or a list of string representing either CAS, identifier or name of a compoundcompound_id (
int or list
) – An integer or a list of integers representing the compound ID(s) in the database.source (
str
) – The string should be either ‘PhysicalProperty’ or ‘PropPred’. Defaults to ‘PhysicalProperty’, returning properties from the PhysicalProperty TABLE. The set to ‘PropPred’, it will return the estimated properties in PropPred TABLE.
- Returns
The list of PhysicalPropertyRow instances or PropPredRow instances in the database
- Return type
list of PhysicalPropertyRow or PropPredRow
-
modify_attribute_by_compound_id
(attribute, value, compound_id)¶ Modify the attribute value for an entry associated with the compound id.
- Parameters
attribute (
str
) – the attribute to be modified. It can be one of the following: ‘name’, ‘cas’, ‘identifier’, ‘smiles’, ‘nring’.value (
str
) – the new value of the specified attribute for the compound ID(s).compound_id (
int
) – an integer representing the compound ID.
Example :
db.modify_attribute_by_compound_id("identifier","InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H", 0)
-
update_compound_by_conformer_id
(compound_id, conformer_id)¶ Update the data for a compound ID row in the COMPOUND TABLE using the data from a conformer ID row in the CONFORMER TABLE.
- Parameters
compound_id (
int
) – A integer representing compound id corresponding to a specific row in the COMPOUND TABLE of the databaseconformer_id (
int
) – A integer representing conformer id corresponding to a specific row in the CONFORMER TABLE of the database
-
update_compound_by_lowestE
(compound_id=None)¶ Update the data for a compound ID row in the COMPOUND TABLE using the data from a conformer ID row with the lowest energy having the same compound ID in the CONFORMER TABLE.
- Keyword Arguments
compound_id (
int or list
) – An integer or a list of integers representing the compound id(s) that represent specific rows in the COMPOUND TABLE of the database.the compound_id is not specified, the method will be applied to the whole database. (If) –
-
visualize_conformers
(compound_id)¶ Visualize a set of conformers in the order of the conformers id
- Parameters
compound_id (
int
) – an integer representing the compound ID.
-
class
pyCRS.Database.
CompoundRow
(compound_id: int, conformer_id: int, name: str, cas: str, identifier: str, smiles: str, resolved_smiles: str, coskf: str, Egas: float, Ecosmo: float, nring: int)¶ A data class to represent the contents of a row in a COMPOUND TABLE in
COSKFDatabase
-
compound_id
¶ A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type
int
-
conformer_id
¶ A unique identifer for a specific row in the CONFORMER TABLE of the database
- Type
int
-
name
¶ The name associated with the row in the COMPOUND TABLE
- Type
str
-
cas
¶ The CAS number associated with the row, i.e., the compound
- Type
str
-
identifier
¶ The chemical identifier associated with the row, i.e., the compound
- Type
str
-
smiles
¶ The SMILES string provided by user
- Type
str
-
resolved_smiles
¶ The derived SMILES string obtained using OpenBabel from the coordinates in the COSKF file.
- Type
str
-
coskf
¶ The filename of the
.coskf
file stored in the localSCM_PYCRS_COSKF_DB
directory- Type
str
-
Egas
¶ The gas phase bond energy rounded to 3 decimal places in kcal/mol
- Type
float
-
Ecosmo
¶ The bond energy in a perfect conductor rounded to 3 decimal places in kcal/mol
- Type
float
-
nring
¶ The number of ring atoms
- Type
int
-
db_path
¶ The path to the
.coskf
file directory- Type
str
-
get_full_coskf_path
()¶ Returns the full path of the corresponding
.coskf
file
-
read_coskf
()¶ Opens the
.coskf
file corresponding to the database entry and returns a scm.plams.KFFile instance
-
-
class
pyCRS.Database.
ConformerRow
(conformer_id: int, compound_id: int, name: str, cas: str, identifier: str, smiles: str, resolved_smiles: str, coskf: str, Egas: float, Ecosmo: float, nring: int)¶ A data class to represent the contents of a row in a CONFORMER TABLE in
COSKFDatabase
-
conformer_id
¶ A unique identifer for a specific row in the CONFORMER TABLE of the database
- Type
int
-
compound_id
¶ A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type
int
-
name
¶ The name associated with the row in the CONFORMER TABLE
- Type
str
-
cas
¶ The CAS number associated with the row, i.e., the compound
- Type
str
-
identifier
¶ The chemical identifier associated with the row, i.e., the compound
- Type
str
-
smiles
¶ The SMILES string provided by user
- Type
str
-
resolved_smiles
¶ The derived SMILES string obtained using OpenBabel from the coordinates in the COSKF file
- Type
str
-
coskf
¶ The filename of the
.coskf
file stored in the localSCM_PYCRS_COSKF_DB
directory- Type
str
-
Egas
¶ The gas phase bond energy rounded to 3 decimal places in kcal/mol
- Type
float
-
Ecosmo
¶ The bond energy in a perfect conductor rounded to 3 decimal places in kcal/mol
- Type
float
-
nring
¶ The number of ring atoms
- Type
int
-
db_path
¶ The path to the
.coskf
file directory- Type
str
-
get_full_coskf_path
()¶ Returns the full path of the corresponding
.coskf
file
-
read_coskf
()¶ Opens the
.coskf
file corresponding to the database entry and returns a scm.plams.KFFile instance
-
-
class
pyCRS.Database.
PhysicalPropertyRow
(compound_id: int, meltingpoint: float, hfusion: float, cpfusion: float, boilingpoint: float, density: float, flashpoint: float, dielectricconstant: float, vp_equation: str, vp_params: str, tvap: float, pvap: float, Mn: float)¶ A data class to represent the contents of a row in a PhysicalProperty TABLE in
COSKFDatabase
-
compound_id
¶ A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type
int
-
meltingpoint
¶ melting temperature (K)
- Type
float
-
hfusion
¶ enthalpy of husion (kcal/mol)
- Type
float
-
cpfusion
¶ heat capacity of fusion (kcal/mol-K) calculated as the difference between the heat capacity in the liquid state and the heat capacity in the solid state.
- Type
float
-
boilingpoint
¶ boiling pointK (K)
- Type
float
-
density
¶ liquid density (kg/L)
- Type
float
-
flashpoint
¶ flash point (K)
- Type
float
-
dielectricconstant
¶ dielectric constant
- Type
flash
-
vp_equation
¶ The vapor pressure equation to use. Unit in bar. Options include: ANTOINE, VPM1 and DIPPR101
- Type
str
-
vp_params
¶ Parameters for the vp_equation, expressed as “A, B, C, D, E”
- Type
str
-
tvap
¶ Temperature(K) at pvap
- Type
float
-
pvap
¶ Pressure(bar) at tvap
- Type
float
-
Mn
¶ polymer average molecular weight (g/mol)
- Type
float
- Vapor Pressure Equations:
- ANTOINE:
log10(P) = A - B/(C+T)
- DIPPR101:
ln(P) = A + B/T + C*ln(T) + D*T**E
- VPM1:
ln(P) = A/T + B*ln(T) + C*T + D
-
-
class
pyCRS.Database.
PropPredRow
(compound_id: int, adopt_smiles: str, meltingpoint: float, hfusion: float, boilingpoint: float, density: float, flashpoint: float, dielectricconstant: float, vp_equation: str, vp_params: str)¶ A data class to represent the contents of a row in a PropPred TABLE in
COSKFDatabase
-
compound_id
¶ A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type
int
-
adopt_smiles
¶ The SMILES used for QSPR method
- Type
str
-
meltingpoint
¶ melting temperature (K)
- Type
float
-
hfusion
¶ enthalpy of husion (kcal/mol)
- Type
float
-
boilingpoint
¶ boiling pointK (K)
- Type
float
-
density
¶ liquid density (kg/L)
- Type
float
-
flashpoint
¶ flash point (K)
- Type
float
-
dielectricconstant
¶ dielectric constant
- Type
flash
-
vp_equation
¶ The vapor pressure equation to use. Unit in bar. VPM1
- Type
str
-
vp_params
¶ Parameters for the vp_equation, expressed as “A, B, C, D, E”
- Type
str
-