Propersea - Property Prediction
Propersea is an online resource to provide predictions for a range of molecular and physicochemical properties for small molecules. The predicted properties include: melting point, boiling point, density, logP, solubility, polarizability and more. It also predicts the IUPAC name for the molecule.
Propersea is available to search using the PSDI Cross Data Search service, where it can be searched using a SMILES string, InChI (including InChI=), or a structure. Once the search is complete the user will be shown the results of the predictions for that molecule.
Property prediction
The properties are predicted through a variety of algorithms, including:
- RDKit algorithms
- Semi-empirical quantum methods
- Fragment/ atom contribution calculations
- Bayesian Additive Regression Trees
- Transformer neural networks
The predicted value is returned in the results interface. For those properties predicted using the Bayesian algorithms it also returns an interval for the 95% confidence, along with a measure of how well the molecule compares to molecules contained in the training set. Where a property prediction is deemed non-sensical due to the predicted phase, the property may be omitted from results.
Propersea performs best for organic compounds and performance on inorganics, orgometallics and inorganic-organic mixtures is known to be lower.
IUPAC Name Prediction
Propersea also features a novel machine learning model for generation of IUPAC names. This machine learning model is a sequence-to-sequence model that can predict the IUPAC name from the molecules InChI string. The model has been trained on a dataset of 10 million compounds and tested on a 200,000 compound dataset, achieving an accuracy of 90.7% on a complete match to the IUPAC name. This model performs extremely well with organic compounds, and also handles isomers / tautomers that are adequately described by the InChI.
However the current model does not perform well on inorganics, organometallics, and inorganic-organic mixtures. This is in part likely due to the limitations of the InChI in describing these molecules, and also in the quality and quantity of the molecules in the training dataset. Work is ongoing to improve the performance of the model in these areas.
For more information about this model, see Translating the molecules: adapting neural machine translation to predict IUPAC names from a chemical identifier.
What to do next
Related links:
- Creator: Cerys Willoughby
- Last modified date: 2025-04-11
- License: CC-BY-4.0
If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.