I just read one paper which name is 'Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research'.
The paper said that 'Most cheminformatics and QSAR software does not treat inorganic molecules, because the majority of molecular descriptors can be computed for organic compounds only. Plus, At present, all inorganic compounds must be removed before the descriptors are calculated.'
My major is biology...I don't know the paper that i told you above.
This is the title and abstract of the paper that you have read:
J Chem Inf Model. Author manuscript; available in PMC 2011 Jul 26.
Published in final edited form as:
J Chem Inf Model. 2010 Jul 26; 50(7): 1189–1204.
doi: 10.1021/ci100176x
PMCID: PMC2989419
NIHMSID: NIHMS216709
Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research
Denis Fourches,1 Eugene Muratov,1,2 and Alexander Tropsha1,*
Author information ► Copyright and License information ►
The publisher's final edited version of this article is available at J Chem Inf Model
See other articles in PMC that cite the published article.
Go to:
Abstract
Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all scientists working in the fields of molecular modeling, cheminformatics, and QSAR studies.
I believe that the lack of sufficient accurate experimental data is the source of not using inorganic compounds in QSAR and cheminformatics. Once sufficient data will be available, softwares will be adjusted to run QSAR and cheminformatics without any difficulty.
A technical reason is that most of the molecular descriptors are designed for typical molecules with covalent bonds. Another reason is that QSAR studies are strongly supported by pharmaceutical companies, where organic substances play a central role. We discuss a bit of this in a recent publication.
Regards,
Guillermo
Article Mereology of Quantitative Structure-Activity Relationships Models
All graph indices might be calculated for both organics and inorganics.
For instance, you definitely can represent structure of inorganic compound as a crystal cell (ordered graph), and all mathematical indices will be available for this type of representation.
Most of other parameters may also work for inorganics (in principle). Developers are just lazy to set types of all atoms in their software. However, the quality of more specific parameters (e.g. cation polarization power, cation index, cell parameters) is higher.
I am developing and performing analysis of different descriptors for inorganic nanoparticles, using various specific descriptors. In attached paper you may find some useful thoughts and link to a nice software which is able to handle inorganics.
Article From Basic Physics to Mechanisms of Toxicity: Liquid Drop Ap...
Article Causal inference methods to assist in mechanistic interpreta...
I agree with Natalia, most developers are too lazy to set atomtypes in their softwares, it is possible to include inorganic atoms in QSAR studies but you need meaningful descriptors which would take into account the properties of the inorganic atoms.
You can calculate QSAR properties of both inorganic, organometallic and organic molecules at www.chemosophia.com. There are no limitations for atomic composition.
As already reported QSAR was born to treat organic molecules, full connected molecules with covalent bonds. Many molecular descriptors have been originally proposed and evaluated on set of organic molecules. Additionally there are lot of experimental and theorical data on organic molecules.
Anyway most descriptors can be calculated on both organic and inorganic molecules.
Dragon, one of the most cited software for molecular descriptors calculation, can handle both organic and inorganic molecules, full connected or not.