Empowering New Sources of Drug Discovery
01 When AI capabilities flow into thousands of industries
AI brings us the ability not only to deal with large-scale life data, but also scientific data. Its essence is to express high-dimensional complex functions, which can make better use of scientific laws, quantum mechanics equations and molecular mechanics equations, and can solve physical equations more efficiently and accurately for simulation. For example, in the design process of drugs or materials, and in the process of manufacturing aircraft, dams, bridges and other large-scale projects, we can first carry out calculation simulation, and ensure that there is no problem in the computer simulation process before we really carry out the experiment and entity design.
This series of breakthroughs in new technologies will bring new breakthroughs in the design and production of industrialization in the micro world. Such a series of new micro-world computing and design tools driven by the underlying paradigm will bring more differences to drug research and development, material research and development and many industries.
In today's computational biology or drug design, material design, chemical design and other scenarios, it is often expected to use computational simulation to solve some problems, but it is very difficult to implement. The reason is that to solve the essence of these problems, it is necessary to effectively describe the complex multi-body interaction between microscopic particles, and ultimately corresponding to solving some high-dimensional complex differential equations. These equations may have existed more than 100 years ago, but there has been a lack of effective calculation tools and algorithm tools to overcome the disaster of dimensionality.
The dimensionality disaster means that the computational complexity index required to solve the known equation depends on the number of inputs. For example, the input of protein system starts from hundreds of thousands, and the calculated computational power demand index depends on the input, which also means that it is completely insoluble. Therefore, we need to introduce a large number of artificial approximations and artificial modeling when we really use computational simulation to do further calculations.
The modeling process makes the accuracy of simulation difficult to meet the actual requirements, which is the biggest problem we have faced for a long time. The role of AI is to effectively represent the interaction of electrons, molecules and atoms, so as to overcome the disaster of dimensionality, simulate more efficiently and accurately, make the accuracy of simulation meet the requirements of reality, and truly guide the experiment.
AI for Industry is a direct model training of massive data accumulated from the development of the industry, and it is expected to solve practical problems. However, there is a problem of data scarcity here. The data of many industries often have some characteristics that are not conducive to the use of AI. For example, the data sample size is very small, for example, the data label is very complex, for example, the dependency between the information in the data and the target is very complex.
The opportunities brought by AI for Science are far more than direct fitting of scientific data. The development of science industry is to express scientific principles as a series of familiar physical laws and scientific equations. The possibility that AI can bring is to learn some scientific principles or physical models, so as to effectively solve physical equations, which can be further used to solve practical problems and overcome many problems caused by data scarcity. In the biomedical industry, the more valuable targets and systems are, the higher the scarcity of their data.
Therefore, computational simulation can bring many new possibilities, and AI can make computational simulation faster and more accurate.
02 Biomedicine embraces AI and creates more possibilities for the field
A series of new tools have been gradually developed based on AI's capabilities in scientific applications, especially in the field of drug design. What drug research and development needs is not one or two core computing tools or one or two heavyweight functions, but a solution system. At the same time, through continuous iteration, we can truly form industry-oriented solutions.
Protein structure prediction is a common scenario in the field of drug research. At present, in the field of drug design, some relatively rare data such as RNA related drug research and development are not rich enough, so the model effect is not good enough. On the one hand, we need to continuously improve the model, on the other hand, we also need some solutions to better integrate the actual simulation and experiment.
Uni-Fold reproduced the whole process of protein structure prediction from training to prediction to product, and achieved better results under some metrics. In addition, we have released polymers, training codes required in many complex situations, data required and corresponding models to the open source community, and hope to further promote the development of drug research.
In addition, locus is also a dimension of concern for drug design. Although the overall prediction results from AI model are very good, there are still some deficiencies in some parts. Therefore, it is necessary to further refine by combining simulation methods, and the most common problem faced by simulation is time scale.
Large conformational changes of proteins often take a long time to simulate, so we use the RiD method to express the free energy corresponding to high-dimensional set variables using neural networks, and then use the free energy to accelerate the simulation. Combined with AI prediction, we can further refine the protein conformation and get a better structure.
In many cases, drug design needs to consider allosteric. AI's model prediction can provide us with conformational information, and we also need enhanced sampling to help find allosteric sites. For example, in one case, the allosteric site is located in the lower left corner. Because of the high barrier of the traditional simulation method, the confirmation of the system is stuck in the positive-structure position in most of the simulation time, such as 50 nanoseconds. However, combined with the enhanced sampling of AI, the allosteric sites of the system can be quickly and widely collected.
In the case of drug research and development, we found that there are covalently bound drugs under the orthosteric site of the system, but covalent drugs tend to have poor selectivity because they are relatively active and are often easy to shift to other unrelated sites of different types. In response to this problem, we have found a more suitable allosteric site, and carried out non-covalent drug design for allosteric sites, with stronger activity. The above implementation also needs to effectively combine the structure prediction of AI and further enhance the simulation of sampling.
In addition to the combination of AI models, the combination of simulation methods is also very important for the analysis of the structure of freeze electron microscopy. For example, given the electron microscope density map, it is an electronic constraint for the final determination of the structure of the protein system. Combined with the effect achieved by simulation, the system can fit well into the constraint of the density map. The direct Uni-Fold structure prediction is the initial condition of the structure determination. Combined with the experimental data, the final MD under constraint can bring us the most ideal structure.
After determining the structure and target, large-scale virtual screening is required. Docking scheme has been frequently used in many fields in the past ten years. However, in the context of high-performance computing today, it needs to be optimized to the utmost - move all parts to the GPU. Using the characteristics of GPU to conduct global search on the docking conformation and local optimization, further adjustments can be made, for example, global exploration parameters can be larger, and local optimization can be more parallel.
After a series of optimization for GPU characteristics, the performance under the same precision has been greatly improved. In the case of parallel scheduling of 100 cards NVDIA V100 GPU, it only takes 11.3 hours to complete the multi-level molecular docking of the 38 million molecular database.
Blood brain barrier and other types of diseases require relatively small molecules. For some specific disease types, the molecular possibility has no need to try, and can be basically enumerated for screening, which is also a new possibility brought by the combination of extreme computing power and corresponding algorithms.
After the large-scale screening and activity confirmation, the drug needs to be further modified to meet the optimization requirements of ADME/T and other aspects, while maintaining its activity.
The solution of Uni-FEP can quantitatively calculate the change of binding free energy before and after drug change. At present, the calculation ability has reached the standard of chemical precision, thus greatly saving the experimental cost and time cost required for the synthesis of molecules.
03 AI+computing power scenario needs overlap, and cloud is the trend
A complete set of computing solutions has been formed in all aspects of drug research and development. With the deepening of application scenarios, many complex scenarios will appear for computing solutions, and the complexity of the scenarios makes new requirements for the final industrialization of the solution. At the same time, the infrastructure of computing power is changing rapidly. The underlying performance characteristics, whether to choose performance optimization, and whether to choose migration will also be very important considerations in terms of cost under large-scale demand.
Based on some solutions, pipeline has been formed in the field of drug research and development, which is a computational solution formed in a series of links from structure to dynamics, drug discovery, and establishment of efficient relationship. Its logic is also very simple, mainly divided into data drive and simulation drive.
A series of solutions have high flexibility requirements. On the basis of high elasticity, different schemes have very different requirements for data use. For example, most of the time the simulation needs high computational power, while the data of the cryomicroscope is very large. Such flexibility and flexibility are difficult to achieve in the past computing solutions. Therefore, clouding is the general trend.
With the development of deep business, for example, when customers use the drug R&D platform of Shenshi Technology, the demand for privatization is very typical and very large. Combined with the computing nest solution, users can focus more on the software solutions needed by the business, and hand over the privatization deployment to the cloud for implementation.
The development of computing power and data algorithms gave birth to AI, and with the gradual development of AI, it needs to be able to effectively use physical laws to bring more possibilities from the bottom.
AI brings us the ability not only to deal with large-scale life data, but also scientific data. Its essence is to express high-dimensional complex functions, which can make better use of scientific laws, quantum mechanics equations and molecular mechanics equations, and can solve physical equations more efficiently and accurately for simulation. For example, in the design process of drugs or materials, and in the process of manufacturing aircraft, dams, bridges and other large-scale projects, we can first carry out calculation simulation, and ensure that there is no problem in the computer simulation process before we really carry out the experiment and entity design.
This series of breakthroughs in new technologies will bring new breakthroughs in the design and production of industrialization in the micro world. Such a series of new micro-world computing and design tools driven by the underlying paradigm will bring more differences to drug research and development, material research and development and many industries.
In today's computational biology or drug design, material design, chemical design and other scenarios, it is often expected to use computational simulation to solve some problems, but it is very difficult to implement. The reason is that to solve the essence of these problems, it is necessary to effectively describe the complex multi-body interaction between microscopic particles, and ultimately corresponding to solving some high-dimensional complex differential equations. These equations may have existed more than 100 years ago, but there has been a lack of effective calculation tools and algorithm tools to overcome the disaster of dimensionality.
The dimensionality disaster means that the computational complexity index required to solve the known equation depends on the number of inputs. For example, the input of protein system starts from hundreds of thousands, and the calculated computational power demand index depends on the input, which also means that it is completely insoluble. Therefore, we need to introduce a large number of artificial approximations and artificial modeling when we really use computational simulation to do further calculations.
The modeling process makes the accuracy of simulation difficult to meet the actual requirements, which is the biggest problem we have faced for a long time. The role of AI is to effectively represent the interaction of electrons, molecules and atoms, so as to overcome the disaster of dimensionality, simulate more efficiently and accurately, make the accuracy of simulation meet the requirements of reality, and truly guide the experiment.
AI for Industry is a direct model training of massive data accumulated from the development of the industry, and it is expected to solve practical problems. However, there is a problem of data scarcity here. The data of many industries often have some characteristics that are not conducive to the use of AI. For example, the data sample size is very small, for example, the data label is very complex, for example, the dependency between the information in the data and the target is very complex.
The opportunities brought by AI for Science are far more than direct fitting of scientific data. The development of science industry is to express scientific principles as a series of familiar physical laws and scientific equations. The possibility that AI can bring is to learn some scientific principles or physical models, so as to effectively solve physical equations, which can be further used to solve practical problems and overcome many problems caused by data scarcity. In the biomedical industry, the more valuable targets and systems are, the higher the scarcity of their data.
Therefore, computational simulation can bring many new possibilities, and AI can make computational simulation faster and more accurate.
02 Biomedicine embraces AI and creates more possibilities for the field
A series of new tools have been gradually developed based on AI's capabilities in scientific applications, especially in the field of drug design. What drug research and development needs is not one or two core computing tools or one or two heavyweight functions, but a solution system. At the same time, through continuous iteration, we can truly form industry-oriented solutions.
Protein structure prediction is a common scenario in the field of drug research. At present, in the field of drug design, some relatively rare data such as RNA related drug research and development are not rich enough, so the model effect is not good enough. On the one hand, we need to continuously improve the model, on the other hand, we also need some solutions to better integrate the actual simulation and experiment.
Uni-Fold reproduced the whole process of protein structure prediction from training to prediction to product, and achieved better results under some metrics. In addition, we have released polymers, training codes required in many complex situations, data required and corresponding models to the open source community, and hope to further promote the development of drug research.
In addition, locus is also a dimension of concern for drug design. Although the overall prediction results from AI model are very good, there are still some deficiencies in some parts. Therefore, it is necessary to further refine by combining simulation methods, and the most common problem faced by simulation is time scale.
Large conformational changes of proteins often take a long time to simulate, so we use the RiD method to express the free energy corresponding to high-dimensional set variables using neural networks, and then use the free energy to accelerate the simulation. Combined with AI prediction, we can further refine the protein conformation and get a better structure.
In many cases, drug design needs to consider allosteric. AI's model prediction can provide us with conformational information, and we also need enhanced sampling to help find allosteric sites. For example, in one case, the allosteric site is located in the lower left corner. Because of the high barrier of the traditional simulation method, the confirmation of the system is stuck in the positive-structure position in most of the simulation time, such as 50 nanoseconds. However, combined with the enhanced sampling of AI, the allosteric sites of the system can be quickly and widely collected.
In the case of drug research and development, we found that there are covalently bound drugs under the orthosteric site of the system, but covalent drugs tend to have poor selectivity because they are relatively active and are often easy to shift to other unrelated sites of different types. In response to this problem, we have found a more suitable allosteric site, and carried out non-covalent drug design for allosteric sites, with stronger activity. The above implementation also needs to effectively combine the structure prediction of AI and further enhance the simulation of sampling.
In addition to the combination of AI models, the combination of simulation methods is also very important for the analysis of the structure of freeze electron microscopy. For example, given the electron microscope density map, it is an electronic constraint for the final determination of the structure of the protein system. Combined with the effect achieved by simulation, the system can fit well into the constraint of the density map. The direct Uni-Fold structure prediction is the initial condition of the structure determination. Combined with the experimental data, the final MD under constraint can bring us the most ideal structure.
After determining the structure and target, large-scale virtual screening is required. Docking scheme has been frequently used in many fields in the past ten years. However, in the context of high-performance computing today, it needs to be optimized to the utmost - move all parts to the GPU. Using the characteristics of GPU to conduct global search on the docking conformation and local optimization, further adjustments can be made, for example, global exploration parameters can be larger, and local optimization can be more parallel.
After a series of optimization for GPU characteristics, the performance under the same precision has been greatly improved. In the case of parallel scheduling of 100 cards NVDIA V100 GPU, it only takes 11.3 hours to complete the multi-level molecular docking of the 38 million molecular database.
Blood brain barrier and other types of diseases require relatively small molecules. For some specific disease types, the molecular possibility has no need to try, and can be basically enumerated for screening, which is also a new possibility brought by the combination of extreme computing power and corresponding algorithms.
After the large-scale screening and activity confirmation, the drug needs to be further modified to meet the optimization requirements of ADME/T and other aspects, while maintaining its activity.
The solution of Uni-FEP can quantitatively calculate the change of binding free energy before and after drug change. At present, the calculation ability has reached the standard of chemical precision, thus greatly saving the experimental cost and time cost required for the synthesis of molecules.
03 AI+computing power scenario needs overlap, and cloud is the trend
A complete set of computing solutions has been formed in all aspects of drug research and development. With the deepening of application scenarios, many complex scenarios will appear for computing solutions, and the complexity of the scenarios makes new requirements for the final industrialization of the solution. At the same time, the infrastructure of computing power is changing rapidly. The underlying performance characteristics, whether to choose performance optimization, and whether to choose migration will also be very important considerations in terms of cost under large-scale demand.
Based on some solutions, pipeline has been formed in the field of drug research and development, which is a computational solution formed in a series of links from structure to dynamics, drug discovery, and establishment of efficient relationship. Its logic is also very simple, mainly divided into data drive and simulation drive.
A series of solutions have high flexibility requirements. On the basis of high elasticity, different schemes have very different requirements for data use. For example, most of the time the simulation needs high computational power, while the data of the cryomicroscope is very large. Such flexibility and flexibility are difficult to achieve in the past computing solutions. Therefore, clouding is the general trend.
With the development of deep business, for example, when customers use the drug R&D platform of Shenshi Technology, the demand for privatization is very typical and very large. Combined with the computing nest solution, users can focus more on the software solutions needed by the business, and hand over the privatization deployment to the cloud for implementation.
The development of computing power and data algorithms gave birth to AI, and with the gradual development of AI, it needs to be able to effectively use physical laws to bring more possibilities from the bottom.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00