This repository is the official implementation of K-MSE, which is accepted by ACL 2025.
In this work, we introduce a Knowledge-enhanced reasoning framework for Molecular Structure Elucidation (K-MSE), leveraging Monte Carlo Tree Search for test-time scaling as a plugin. Specifically, we construct an external molecular substructure knowledge base to extend the LLMs' coverage of the chemical structure space. Furthermore, we design a specialized molecule-spectrum scorer to act as a reward model for the reasoning process, addressing the issue of inaccurate solution evaluation in LLMs.
The environment setup is detailed in the environment.yml file.
- The implementation of K-MSE is in the
k-msefolder. - The baseline implementation is in the
baselinefolder.
We gratefully acknowledge the use of code from the following projects: MolPuzzle and MCTSr.
Please cite this work as follows:
@inproceedings{zhuang-etal-2025-boosting,
title = "Boosting {LLM}{'}s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning",
author = "Zhuang, Xiang and
Wu, Bin and
Cui, Jiyu and
Feng, Kehua and
Li, Xiaotong and
Xing, Huabin and
Ding, Keyan and
Zhang, Qiang and
Chen, Huajun",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.1100/",
pages = "22561--22576",
ISBN = "979-8-89176-251-0",
abstract = "Molecular structure elucidation involves deducing a molecule{'}s structure from various types of spectral data, which is crucial in chemical experimental analysis. While large language models (LLMs) have shown remarkable proficiency in analyzing and reasoning through complex tasks, they still encounter substantial challenges in molecular structure elucidation. We identify that these challenges largely stem from LLMs' limited grasp of specialized chemical knowledge. In this work, we introduce a Knowledge-enhanced reasoning framework for Molecular Structure Elucidation (K-MSE), leveraging Monte Carlo Tree Search for test-time scaling as a plugin. Specifically, we construct an external molecular substructure knowledge base to extend the LLMs' coverage of the chemical structure space. Furthermore, we design a specialized molecule-spectrum scorer to act as a reward model for the reasoning process, addressing the issue of inaccurate solution evaluation in LLMs. Experimental results show that our approach significantly boosts performance, particularly gaining more than 20{\%} improvement on both GPT-4o-mini and GPT-4o."
}