Uncertainty-Quantified Hybrid Machine Learning/Density Functional Theory High Throughput Screening Method for Crystals


Computational high throughput screening (HTS) has emerged as a significant tool in material science to accelerate the discovery of new materials with target properties in recent years. However, despite many successful cases in which HTS led to the novel discovery, currently, the major bottleneck in HTS is a large computational cost of density functional theory (DFT) calculations that scale cubically with system size, limiting the chemical space that can be explored. The present work aims at addressing this computational burden of HTS by presenting a machine learning (ML) framework that can efficiently explore the chemical space. Our model is built upon an existing crystal graph convolutional neural network (CGCNN) to obtain formation energy of a crystal structure but is modified to allow uncertainty quantification for each prediction using the hyperbolic tangent activation function and dropout algorithm (CGCNN-HD). The uncertainty quantification is particularly important since typical usage of CGCNN (due to the lack of gradient implementation) does not involve structural relaxation which could cause substantial prediction errors. The proposed method is benchmarked against an existing application that identified promising photoanode material among the >7,000 hypothetical Mg–Mn–O ternary compounds using all DFT-HTS. In our approach, we perform the approximate HTS using CGCNN-HD and refine the results using full DFT for those selected (denoted as ML/DFT-HTS). The proposed hybrid model reduces the required DFT calculations by a factor of >50 compared to the previous DFT-HTS in making the same discovery of Mg2MnO4, experimentally validated new photoanode material. Further analysis demonstrates that the addition of HD components with uncertainty measures in the CGCNN-HD model increased the discoverability of promising materials relative to all DFT-HTS from 30% (CGCNN) to 68% (CGCNN-HD). The present ML/DFT-HTS with uncertainty quantification can thus be a fast alternative to DFT-HTS for efficient exploration of the vast chemical space.

Journal of Chemical Information and Modeling, 2020, 60, 1996-2003