TY - JOUR
T1 - Sc2Mol: a scaffold-based two-step molecule generator with variational autoencoder and transformer
AU - Liao, Zhirui
AU - Xie, Lei
AU - Mamitsuka, Hiroshi
AU - Zhu, Shanfeng
N1 - Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - MOTIVATION: Finding molecules with desired pharmaceutical properties is crucial in drug discovery. Generative models can be an efficient tool to find desired molecules through the distribution learned by the model to approximate given training data. Existing generative models (i) do not consider backbone structures (scaffolds), resulting in inefficiency or (ii) need prior patterns for scaffolds, causing bias. Scaffolds are reasonable to use, and it is imperative to design a generative model without any prior scaffold patterns. RESULTS: We propose a generative model-based molecule generator, Sc2Mol, without any prior scaffold patterns. Sc2Mol uses SMILES strings for molecules. It consists of two steps: scaffold generation and scaffold decoration, which are carried out by a variational autoencoder and a transformer, respectively. The two steps are powerful for implementing random molecule generation and scaffold optimization. Our empirical evaluation using drug-like molecule datasets confirmed the success of our model in distribution learning and molecule optimization. Also, our model could automatically learn the rules to transform coarse scaffolds into sophisticated drug candidates. These rules were consistent with those for current lead optimization. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/zhiruiliao/Sc2Mol. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
AB - MOTIVATION: Finding molecules with desired pharmaceutical properties is crucial in drug discovery. Generative models can be an efficient tool to find desired molecules through the distribution learned by the model to approximate given training data. Existing generative models (i) do not consider backbone structures (scaffolds), resulting in inefficiency or (ii) need prior patterns for scaffolds, causing bias. Scaffolds are reasonable to use, and it is imperative to design a generative model without any prior scaffold patterns. RESULTS: We propose a generative model-based molecule generator, Sc2Mol, without any prior scaffold patterns. Sc2Mol uses SMILES strings for molecules. It consists of two steps: scaffold generation and scaffold decoration, which are carried out by a variational autoencoder and a transformer, respectively. The two steps are powerful for implementing random molecule generation and scaffold optimization. Our empirical evaluation using drug-like molecule datasets confirmed the success of our model in distribution learning and molecule optimization. Also, our model could automatically learn the rules to transform coarse scaffolds into sophisticated drug candidates. These rules were consistent with those for current lead optimization. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/zhiruiliao/Sc2Mol. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=85146364155&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btac814
DO - 10.1093/bioinformatics/btac814
M3 - Article
C2 - 36576008
AN - SCOPUS:85146364155
SN - 1367-4803
VL - 39
SP - 1
EP - 9
JO - Bioinformatics (Oxford, England)
JF - Bioinformatics (Oxford, England)
IS - 1
ER -