Database¶
The submodule contain several class for providing an interface to a sql database for managing COSKF files and physical properties.
- class pyCRS.Database.COSKFDatabase(path)¶
A class provide an interface to a sql database containing the following tables.
Table name
Description
Compound
contains unique compounds along with their COSKF file based on either CAS number or any prefered identifier
Conformer
contains mutiple conformers along with their COSKF file
PhysicalProperty
contains the physical properties input by user
PropPred
contains the estimated physical properties using QSPR methods based on SMILES
- Parameters:
path (
str
) – a path to the database file. If this file does not exist, it will be created.
- add_compound(coskf_file, name=None, cas=None, identifier=None, coskf_path=None, smiles=None, nring=None, ignore_smiles_check=False, ignore_duplicates=False)¶
Adds a new
.coskf
file to the database.- Parameters:
coskf_file (
str
) – a path to the.coskf
file, or alternatively, the file name of the.coskf
file if thecoskf_path
is provided.- Keyword Arguments:
name (
str
) – The entry’s name, such as the compound name. If not provided, it will prioritize using the IUPAC name, CAS number, identifier, or the name of the .coskf file if such value is provided through the add_compound() method or stored in the ‘Compound Data’ section in the .coskf file.cas (
str
) – The CAS number of the molecule. If not provided, it will attempt to use the CAS within the.coskf
file if available.identifier (
str
) – The chemical identifier of the molecule.coskf_path (
str
) – The directory path to the coskf file. If not provided, it will attempt to locate the path of ADFCRS-2018 database.smiles (
str
) – The SMILES string of the molecule. If not provided, it will attempt to use the SMILES within the.coskf
file if availablenring (
int
) – The numbr of ring atoms. If not provided, it will attempt to use the Nring within the.coskf
file if availableignore_smiles_check (
bool
) – If set to True, skip generating the SMILES from compound’s coordinates to confirm its identity against the database. Default is False.ignore_duplicates (
bool
) – If set to True, skip duplicate recognition using UniqueConformersCrest in AMSConformer tool. Default is False.
Note
Ensure every compound has a unique representation, either by CAS number or a preferred identifier. During the add_compound process, both the CAS number and identifier are checked for uniqueness in the database. If multiple compounds share the same CAS number and identifier, an ERROR will be raised. For instance, the below operation is not allowed since both compound shared the same identifier=’CRS0001’
db.add_compound("Benzene.coskf",cas="71-43-2",identifier="CRS0001") db.add_compound("Ethanol.coskf",cas="64-17-5",identifier="CRS0001")
- add_physical_property(identifier, attribute, value, unit=None)¶
Add a value of a physical property to the PhysicalProperty TABLE in the database by compound’s identifier
- Parameters:
identifier (
str
) – the string representing either CAS, identifier or name of a compoundattribute (
str
) – the name of the physical property (eg. meltingpoint or hfusion)value (
float
) – the value of the physical property
- Keyword Arguments:
unit (
str
) – (optional) the unit of the input value. The default unit is K, kcal/mol and kcal/mol-K. The accepted unit now has K, C, kcal/mol, kJ/mol, cal/g, J/g, kcal/mol-K, kJ/mol-K, cal/g-K, J/g-K
Example
db.add_physical_property(‘Benzene’,’meltingpoint’,278.7) db.add_physical_property(‘Benzene’,’hfusion’,9.91,unit=’kJ/mol’) db.add_physical_property(‘Benzene’,’vp_equation’,’Antoine’) db.add_physical_property(‘Benzene’,’vp_params’,’4.72583, 1660.652, -1.461’)
- clear_physical_property(identifier: str, attribute: str | None = None)¶
Clear the value of a physical property in PhysicalProperty TABLE in the database by compound’s identifier
- Parameters:
identifier (
str
) – the string representing either CAS, identifier or name of a compound- Keyword Arguments:
attribute (
str, optional
) – The name of the physical property to clear. If not provided, all physical properties will be cleared.
- del_row(dbrow: CompoundRow)¶
Remove a compound from the database and delete the corresponding
.coskf
file.- Parameters:
dbrow (
CompoundRow
) – the row to remove from the database
- del_row_by_conformer_id(conformer_id)¶
Remove the conformer from the database.
- Parameters:
conformer_id (
int
) – A integer of intergers representing the conformer in the CONFORMER TABLE.
Example
db.del_row_by_conformer_id(1)
- del_rows(dbrows)¶
Remove multiple compounds from the database and delete the corresponding
.coskf
files.- Parameters:
dbrows (
list
) – the rows to remove from the database, represented as a list of CompoundRow objects
Example
db.del_rows(db.get_compounds(‘benzene’))
- estimate_physical_property(identifier=None, compound_id=None)¶
Estimate the physical properties using the property prediction tool and add the values to the PropPred TABLE in the database
- Keyword Arguments:
identifier (
str or list
) – a string or a list of string representing either CAS, identifier or name of a compoundcompound_id (
int or list
) – an integer or a list representing the compound ID(s).
Note
The QSPR descriptor used in the property prediction tool is determined from the SMILES string. It first attempts to use the SMILES string provided by user via the add_compound method or modify_attribute_by_compound_id method. If unavailable, it will used the SMILES generating by OpenBabel using the compound’s coordinates in the COSKF file. Please note that the resolved SMILES may be incorrect for some molecules, for instance when bond orders cannot be automatically determined and species with charges.
Example :
db.estimate_physical_property("Benzene")
- get_all_compounds()¶
Collect all compounds in the database
- Returns:
The full list of CompoundRow instances in the database
- Return type:
list of CompoundRow
- get_all_conformers()¶
Collects all conformers in the database
- Returns:
The full list of ConformerRow instances in the database.
- Return type:
List of ConformerRow
- get_all_physical_properties(source='PhysicalProperty')¶
Collect all physical properties in the database
- Keyword Arguments:
source (
str
) – The string should be either ‘PhysicalProperty’ or ‘PropPred’. Defaults to ‘PhysicalProperty’, returning properties from the PhysicalProperty TABLE. If set to ‘PropPred’, it will return the estimated properties in PropPred TABLE.- Returns:
The full list of PhysicalPropertyRow instances or PropPredRow instances in the database
- get_attribute_by_compound_id(attributes, compound_id)¶
Retrieve the list of values for compounds with specified compound_id(s)
- Parameters:
attributes (
str or list
) – A string or a list of strings used for searching for in the COMPOUND TABLE.compound_id (
int or list
) – A integer or a list of intergers used to search for compounds in the COMPOUND TABLE.
- Returns:
A list of tuples containing the values of the specified attributes for the compounds.
- Return type:
list of attributes
- get_compounds(values)¶
Retrieves compounds from the COMPOUND TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters:
values (
str or list
) – A string or a list of strings used for searching, representing CAS numbers, chemical identifiers, or names.- Returns:
A list of CompoundRow instances that match the search criteria
- Return type:
list of CompoundRow
- get_compounds_id(values)¶
Retrieves compound id from the COMPOUND TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters:
values (
str or list
) – A string or a list of strings used for searching, representing CAS numbers, chemical identifiers, or names.- Returns:
A list of compound IDs that match the search criteria.
- Return type:
list of int
- get_conformers(values)¶
Retrieves conformers from the CONFORMER TABLE in the database by matching CAS number, chemical identifier, or name.
- Parameters:
values (
str or list
) – A string or a list of strings used for searching, representing CAS numbers, chemical identifiers, or names.- Returns:
A list of ConformerRow instances that match the search criteria.
- Return type:
list of ConformerRow
- get_physical_properties(identifier=None, compound_id=None, source='PhysicalProperty')¶
Collect physical properties in the database by matching CAS number, chemical identifier, name or compound id.
- Keyword Arguments:
identifier (
str or list
) – a string or a list of string representing either CAS, identifier or name of a compoundcompound_id (
int or list
) – An integer or a list of integers representing the compound ID(s) in the database.source (
str
) – The string should be either ‘PhysicalProperty’ or ‘PropPred’. Defaults to ‘PhysicalProperty’, returning properties from the PhysicalProperty TABLE. The set to ‘PropPred’, it will return the estimated properties in PropPred TABLE.
- Returns:
The list of PhysicalPropertyRow instances or PropPredRow instances in the database
- Return type:
list of PhysicalPropertyRow or PropPredRow
- modify_attribute_by_compound_id(attribute, value, compound_id)¶
Modify the attribute value for an entry associated with the compound id.
- Parameters:
attribute (
str
) – the attribute to be modified. It can be one of the following: ‘name’, ‘cas’, ‘identifier’, ‘smiles’, ‘nring’.value (
str
) – the new value of the specified attribute for the compound ID(s).compound_id (
int
) – an integer representing the compound ID.
Example :
db.modify_attribute_by_compound_id("identifier","InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H", 0)
- update_compound_by_conformer_id(compound_id, conformer_id)¶
Update the data for a compound ID row in the COMPOUND TABLE using the data from a conformer ID row in the CONFORMER TABLE.
- Parameters:
compound_id (
int
) – A integer representing compound id corresponding to a specific row in the COMPOUND TABLE of the databaseconformer_id (
int
) – A integer representing conformer id corresponding to a specific row in the CONFORMER TABLE of the database
- update_compound_by_lowestE(compound_id=None)¶
Update the data for a compound ID row in the COMPOUND TABLE using the data from a conformer ID row with the lowest energy having the same compound ID in the CONFORMER TABLE.
- Keyword Arguments:
compound_id (
int or list
) – An integer or a list of integers representing the compound id(s) that represent specific rows in the COMPOUND TABLE of the database.database. (If the compound_id is not specified, the method will be applied to the whole) –
- visualize_conformers(compound_id)¶
Visualize a set of conformers in the order of the conformers id
- Parameters:
compound_id (
int
) – an integer representing the compound ID.
- class pyCRS.Database.CompoundRow(compound_id: int, conformer_id: int, name: str, cas: str, identifier: str, smiles: str, resolved_smiles: str, coskf: str, Egas: float, Ecosmo: float, nring: int)¶
A data class to represent the contents of a row in a COMPOUND TABLE in
COSKFDatabase
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- conformer_id¶
A unique identifer for a specific row in the CONFORMER TABLE of the database
- Type:
int
- name¶
The name associated with the row in the COMPOUND TABLE
- Type:
str
- cas¶
The CAS number associated with the row, i.e., the compound
- Type:
str
- identifier¶
The chemical identifier associated with the row, i.e., the compound
- Type:
str
- smiles¶
The SMILES string provided by user
- Type:
str
- resolved_smiles¶
The derived SMILES string obtained using OpenBabel from the coordinates in the COSKF file.
- Type:
str
- coskf¶
The filename of the
.coskf
file stored in the localSCM_PYCRS_COSKF_DB
directory- Type:
str
- Egas¶
The gas phase bond energy rounded to 3 decimal places in kcal/mol
- Type:
float
- Ecosmo¶
The bond energy in a perfect conductor rounded to 3 decimal places in kcal/mol
- Type:
float
- nring¶
The number of ring atoms
- Type:
int
- db_path¶
The path to the
.coskf
file directory- Type:
str
- get_full_coskf_path()¶
Returns the full path of the corresponding
.coskf
file
- read_coskf()¶
Opens the
.coskf
file corresponding to the database entry and returns a scm.plams.KFFile instance
- class pyCRS.Database.ConformerRow(conformer_id: int, compound_id: int, name: str, cas: str, identifier: str, smiles: str, resolved_smiles: str, coskf: str, Egas: float, Ecosmo: float, nring: int)¶
A data class to represent the contents of a row in a CONFORMER TABLE in
COSKFDatabase
- conformer_id¶
A unique identifer for a specific row in the CONFORMER TABLE of the database
- Type:
int
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- name¶
The name associated with the row in the CONFORMER TABLE
- Type:
str
- cas¶
The CAS number associated with the row, i.e., the compound
- Type:
str
- identifier¶
The chemical identifier associated with the row, i.e., the compound
- Type:
str
- smiles¶
The SMILES string provided by user
- Type:
str
- resolved_smiles¶
The derived SMILES string obtained using OpenBabel from the coordinates in the COSKF file
- Type:
str
- coskf¶
The filename of the
.coskf
file stored in the localSCM_PYCRS_COSKF_DB
directory- Type:
str
- Egas¶
The gas phase bond energy rounded to 3 decimal places in kcal/mol
- Type:
float
- Ecosmo¶
The bond energy in a perfect conductor rounded to 3 decimal places in kcal/mol
- Type:
float
- nring¶
The number of ring atoms
- Type:
int
- db_path¶
The path to the
.coskf
file directory- Type:
str
- get_full_coskf_path()¶
Returns the full path of the corresponding
.coskf
file
- read_coskf()¶
Opens the
.coskf
file corresponding to the database entry and returns a scm.plams.KFFile instance
- class pyCRS.Database.PhysicalPropertyRow(compound_id: int, meltingpoint: float, hfusion: float, cpfusion: float, boilingpoint: float, density: float, flashpoint: float, dielectricconstant: float, vp_equation: str, vp_params: str, tvap: float, pvap: float, Mn: float)¶
A data class to represent the contents of a row in a PhysicalProperty TABLE in
COSKFDatabase
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- meltingpoint¶
melting temperature (K)
- Type:
float
- hfusion¶
enthalpy of husion (kcal/mol)
- Type:
float
- cpfusion¶
heat capacity of fusion (kcal/mol-K) calculated as the difference between the heat capacity in the liquid state and the heat capacity in the solid state.
- Type:
float
- boilingpoint¶
boiling pointK (K)
- Type:
float
- density¶
liquid density (kg/L)
- Type:
float
- flashpoint¶
flash point (K)
- Type:
float
- dielectricconstant¶
dielectric constant
- Type:
flash
- vp_equation¶
The vapor pressure equation to use. Unit in bar. Options include: ANTOINE, VPM1 and DIPPR101
- Type:
str
- vp_params¶
Parameters for the vp_equation, expressed as “A, B, C, D, E”
- Type:
str
- tvap¶
Temperature(K) at pvap
- Type:
float
- pvap¶
Pressure(bar) at tvap
- Type:
float
- Mn¶
polymer average molecular weight (g/mol)
- Type:
float
- Vapor Pressure Equations:
- ANTOINE:
log10(P) = A - B/(C+T)
- DIPPR101:
ln(P) = A + B/T + C*ln(T) + D*T**E
- VPM1:
ln(P) = A/T + B*ln(T) + C*T + D
- class pyCRS.Database.PropPredRow(compound_id: int, adopt_smiles: str, meltingpoint: float, hfusion: float, boilingpoint: float, density: float, flashpoint: float, dielectricconstant: float, vp_equation: str, vp_params: str)¶
A data class to represent the contents of a row in a PropPred TABLE in
COSKFDatabase
- compound_id¶
A unique identifer for a specific row in the COMPOUND TABLE of the database
- Type:
int
- adopt_smiles¶
The SMILES used for QSPR method
- Type:
str
- meltingpoint¶
melting temperature (K)
- Type:
float
- hfusion¶
enthalpy of husion (kcal/mol)
- Type:
float
- boilingpoint¶
boiling pointK (K)
- Type:
float
- density¶
liquid density (kg/L)
- Type:
float
- flashpoint¶
flash point (K)
- Type:
float
- dielectricconstant¶
dielectric constant
- Type:
flash
- vp_equation¶
The vapor pressure equation to use. Unit in bar. VPM1
- Type:
str
- vp_params¶
Parameters for the vp_equation, expressed as “A, B, C, D, E”
- Type:
str