BenchmarkSet1500: High-Accuracy Excited-State Reference Benchmark Dataset for Organic Semiconductors
The Benchmark1500 dataset is the first curated collection of organic molecules with high-level excited-state calculations at this scale, developed to enable reproducible computational chemistry and data-driven discovery. It integrates multiple levels of electronic structure theory (TD-DFT, CASSCF, NEVPT2), allowing direct, systematic comparison between methods.
Each entry includes molecular identifiers alongside consistently computed electronic properties. The dataset is provided both as a searchable database and as a structured CSV file, enabling immediate integration into machine learning pipelines and computational workflows.
Applications
- Machine learning: High-quality predictive and generative models can be trained using this large-scale organic semiconductor dataset computed at high accuracy
- Optoelectronic screening: The dataset can be screened for a range of applications, including thermally activated delayed fluorescence and inverted singlet–triplet gaps relevant to organic light-emitting diodes
- Structure–property relationships: Structure–property relationships can be systematically derived from the dataset
What to do next
- Access and download the dataset from the BenchmarkSet1500 Community Data Collection
Related links:
- Find out about our other Partner projects
- Learn about our other Community Data Collections
- Get started with PSDI
- Creator: Tahereh Nematiaram, Malin Zollner
- Last modified date: 2026-03-27
- License: CC-BY-4.0
- Citation: Please cite: Tahereh Nematiaram and Malin Zollner, BenchmarkSet1500: High-Accuracy Excited-State Reference Benchmark Dataset for Organic Semiconductors, https://guidance.psdi.ac.uk/docusaurus-pages/docs/guidance/partner-projects/benchmarkset1500, PSDI (modified 2026-03-27)
If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.