Conditioning of Reinforcement Learning Agents and Its Policy Regularization Application

Abstract: The outcome of Jacobian singular values regularization was studied for supervised learning problems. It also was shown that Jacobian conditioning regularization can help to avoid the ``mode-collapse'' problem in Generative Adversarial Networks. In this paper, we try to answer the following question: Can information about policy conditioning help to shape a more stable and general policy of reinforcement learning agents? To answer this question, we conduct a study of Jacobian conditioning behavior during policy optimization. To the best of our knowledge, this is the first work that research condition number in reinforcement learning agents. We propose a conditioning regularization algorithm and test its performance on the range of continuous control tasks. Finally, we compare algorithms on the CoinRun environment with separated train end test levels to analyze how conditioning regularization contributes to agents' generalization..

Published on: ICML 2020, BIGRL Workshop


​​Deep learning enables rapid identification of potent DDR1 kinase inhibitors

Abstract: We have developed a deep generative model, generative tensorial reinforcement learning (GENTRL), for de novo small-molecule design. GENTRL optimizes synthetic feasibility, novelty, and biological activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, in 21 days. Four compounds were active in biochemical assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice.

Published on: Nature Biotechnology


Wasserstein-2 Generative Networks
Reinforced Adversarial Neural Computer for de Novo Molecular Design

Abstract: Generative Adversarial Networks training is not easy due to the minimax nature of the optimization objective. In this paper, we propose a novel end-to-end algorithm for training generative models which uses a non-minimax objective simplifying model training. The proposed algorithm uses the approximation of Wasserstein-2 distance by Input Convex Neural Networks. From the theoretical side, we estimate the properties of the generative mapping fitted by the algorithm. From the practical side, we conduct computational experiments which confirm the efficiency of our algorithm in various applied problems: image-to-image color transfer, latent space optimal transport, image-to-image style transfer, and domain adaptation.

Published on: ICLR 2021


Abstract: In silico modeling is a crucial milestone in modern drug design and development. Although computer-aided approaches in this field are well-studied, the application of deep learning methods in this research area is at the beginning. In this work, we present an original deep neural network (DNN) architecture named RANC (Reinforced Adversarial Neural Computer) for the de novodesign of novel small-molecule organic structures based on the generative adversarial network (GAN) paradigm and reinforcement learning (RL). As a generator RANC uses a differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addition of an explicit memory bank, which can mitigate common problems found in adversarial settings. The comparative results have shown that RANC trained on the SMILES string representation of the molecules outperforms its first DNN-based counterpart ORGANIC by several metrics relevant to drug discovery.

Published on: Journal of Chemical Information and Modeling


Adversarial Threshold Neural Computer for Molecular de Novo Design

Abstract: In this article, we propose the deep neural network Adversarial Threshold Neural Computer (ATNC). The ATNC model is intended for the de novo design of novel small-molecule organic structures. The model is based on generative adversarial network architecture and reinforcement learning. ATNC uses a Differentiable Neural Computer as a generator and has a new specific block, called adversarial threshold (AT). AT acts as a filter between the agent (generator) and the environment (discriminator + objective reward functions). Furthermore, to generate more diverse molecules we introduce a new objective reward function named Internal Diversity Clustering (IDC). In this work, ATNC is tested and compared with the ORGANIC model. Both models were trained on the SMILES string representation of the molecules, using four objective functions (internal similarity, Muegge druglikeness filter, presence or absence of sp3-rich fragments, and IDC). The SMILES representations of 15K druglike molecules from the ChemDiv collection were used as a training data set. For the different functions, ATNC outperforms ORGANIC. Combined with the IDC, ATNC generates 72% of valid and 77% of unique SMILES strings, while ORGANIC generates only 7% of valid and 86% of unique SMILES strings. We also performed in vitro validation of the molecules generated by ATNC; results indicated that ATNC is an effective method for producing hit compounds.

Published on: Molecular Pharmaceutics