Data Revival - Making the intangible tangible: The journey from lab notebook to digital insight
Find out how AI tools can be used to extract data from paper notebooks.
About the webinar
Welcome to the Physical Sciences Data Infrastructure (PSDI) webinar series. This webinar series is designed to communicate the PSDI work to a wider audience!
The subject of this webinar, held on 16th November 2023, is extraction of knowledge from lab books, which is being undertaken as part of our Pathfinder activities in conjunction with spin out company Data Revival. This webinar will be presented by Samuel Munday from Data Revival.
Abstract
The University of Southampton’s chemistry department has accumulated a wealth of chemical knowledge over the years, some of which has been captured in over 2000 lab books. The vast majority of these have been sat gathering dust in a cupboard for a long time, unable to be destroyed due to both the value of knowledge they hold and their importance for Health and Safety reporting. However, this knowledge is intangible and difficult to access, offering no value to the department whilst taking up space and presenting a fire risk. We present our work on turning this unstructured resource into a structured accessible database that holds FAIR data open to analytics and intelligent search. Our system utilises AI-driven natural language processing techniques, as well as chemical structure recognition, to extract all the types of chemical information required to create a useful, searchable database. This database unlocks 3000 chemist years of knowledge and enables more efficient and accurate future research. In this webinar we discuss the process of digitising such an archive effectively, the AI tools we have created to work with such unstructured knowledge at scale, the utility of the digital database created for the chemistry department, and the feedback received from the department on the system’s potential for further development.
Bio
Samuel Munday is a senior research assistant at the University of Southampton as well as being the co-director of the fledgling spin out Data Revival. He first became interested in scientific data management whilst building a predictive analytics platform for polymeric materials and realising that a lot of key data resided in a form incomprehensible to computers. This has led to the development of a series of tools for unstructured chemical data extraction and structuring, mainly used for turning hand written lab notebooks into structured searchable databases at scale. He is currently leading the development of this platform which is beginning to show signs of success with both academic researchers and commercial partners.
Watch the recording
You can watch this recording via our You Tube channel. The slides are available on Zenodo.
What to do next
- Watch another Webinar from our list
- Take a look at our self-paced learning
- Try our Tutorials
- Find current and past in-person training opportunities Events
Related links:
- Galaxy Training
- Elixir TeSS: extensive training materials with a focus on computation in the life sciences, but many courses are also relevant for the physical sciences community.
-
Creator: Cerys Willoughby
-
Last modified date: 2025-03-28
If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.