Ab initio calculation of real solids via neural network ansatz | Nature Communications

Mục Lục

Neural network for a solid system

Periodicity and anti-symmetry are two fundamental properties of the wavefunction of a solid system. The anti-symmetry can be ensured by the Slater determinant, which has been commonly used as the basic block in molecular neural networks. We also approximate the wavefunction by two Slater determinants of one spin-up channel and one spin-down channel,

$$\psi ({{{{{{{\bf{r}}}}}}}})={{{{{{{{\rm{Det}}}}}}}}}_{\uparrow }\left[{e}^{i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{\uparrow }}{u}_{{{{{{{{\rm{mol}}}}}}}}}^{\uparrow }(d)\right]{{{{{{{{\rm{Det}}}}}}}}}_{\downarrow }\left[{e}^{i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{\downarrow }}{u}_{{{{{{{{\rm{mol}}}}}}}}}^{\downarrow }(d)\right].$$

(1)

In this regard, our ansatz resembles the structure of FermiNet10,11, whereas other neural network wavefunction ansatz may include extra terms in addition to the Slater determinants12. Each determinant is then constructed from a set of periodic orbitals, which inherits the physics captured by the generalized collective Bloch function formed by a product of phase factor eik⋅r and collective molecular orbital umol. The generalized many-body Bloch function incorporates electron correlations and goes beyond single-electron approximation18.

Figure 1 displays more details on the structure of our neural network. Building an efficient and accurate periodic ansatz is the key step in developing ab initio methods for solids. Here we have followed the recently proposed scheme of Whitehead et al. to construct a set of periodic distance features d(r)20 using lattice vectors in real and reciprocal space (ai, bi),

$$d({{{{{{{\bf{r}}}}}}}})= \, \frac{\sqrt{{{{{{{{\bf{A}}}}}}}}{{{{{{{\bf{M}}}}}}}}{{{{{{{{\bf{A}}}}}}}}}^{T}}}{2\pi },\, {{{{{{{\bf{A}}}}}}}}=({{{{{{{{\bf{a}}}}}}}}}_{1},\, {{{{{{{{\bf{a}}}}}}}}}_{2},\, {{{{{{{{\bf{a}}}}}}}}}_{3}),\\ {{{{{{{{\bf{M}}}}}}}}}_{ij}= \, {f}^{2}({\omega }_{i}){\delta }_{ij}+g({\omega }_{i})g({\omega }_{j})(1-{\delta }_{ij}),\, {\omega }_{i}={{{{{{{\bf{r}}}}}}}}\cdot {{{{{{{{\bf{b}}}}}}}}}_{i}.$$

(2)

The periodic metric matrix M is constructed by periodic functions f, g, which retain ordinary distances at the origin and regulate them to periodic ones at far distances, ensuring asymptotic cusp form, continuity, and periodicity requirement at the same time.

Fig. 1: Sketch of neural network architecture.

The electron coordinates ri are passed to two channels. In the first one, they build the periodic distance features d(r) using the periodic metric matrix M and the lattice vectors a, and then d(r) features are fed into two molecular neural networks, that represent separately the real and the imaginary part of the wavefunction. In the second channel, ri constructs the plane-wave phase factors on a selected set of crystal momentum vectors. The total wavefunctions of solids are constructed by the two channels following the expression of Eq. (1).

Full size image

The constructed periodic distance features d(r) can then be fed into molecular neural networks to form collective orbitals umol. Specifically, in this work, we represent the molecular networks with FermiNet10, which incorporates electron–electron interactions. The inclusion of all-electron features promotes the traditional single-particle orbitals to the collective ones, and hence the description of wavefunction and correlation effects can be improved while fewer Slater determinants are required. In addition, the wavefunction of solid systems is necessarily complex-valued, and we introduce two sets of molecular orbitals to represent the real and imaginary parts of the solid wavefunction, respectively. The plane-wave phase factors eik⋅r in Fig. 1 are used to construct the Bloch function-like orbitals, and the corresponding k points are selected to minimize the Hartree–Fock (HF) energy.

Based on the variational principle, our neural network is trained using the variational Monte Carlo (VMC) approach. To efficiently optimize the network, a Kronecker-factored curvature estimator (KFAC) optimizer23 implemented by DeepMind team24 is modified and adopted, which significantly outperforms traditional energy minimization algorithms. Calculations are also ensured by efficient and massive parallelization on multiple nodes of high-performance GPUs. More details on the theories, methods, and computations are included in the Methods section and the supplementary information.

Hydrogen chain

Hydrogen chain is one of the simplest models in condensed matter research. Despite its simplicity, the hydrogen chain is a challenging and interesting system, serving as a benchmark system for electronic structure methods and featuring intriguing correlated phenomena25. The calculated energy of the periodic H10 chain as a function of the bond length is shown in Fig. 2a. The results from lattice-regularized diffusion Monte Carlo (LR-DMC) and traditional VMC are also plotted for comparison25. We can see that our results nearly coincide with the LR-DMC results and significantly outperform traditional VMC (see Supplementary Table 3). In Fig. 2b, the energy of hydrogen chains of different atom numbers are calculated for extrapolation to the thermodynamic limit (TDL). The shaded bar in Fig. 2b illustrates the extrapolated energy of the periodic hydrogen chain at TDL from auxiliary field quantum Monte Carlo (AFQMC), which is considered as the current state-of-the-art along with LR-DMC. Our TDL result is comparable with both AFQMC and LR-DMC (see Supplementary Table 4).

Fig. 2: Calculated results of neural network.

Our results are all labeled as Net. Statistical errors are negligible for the presented data. a H10 dissociation curve is plotted. b energy of different hydrogen chain sizes N, the bond length of the hydrogen chain is fixed at 1.8 Bohr. LR-DMC and VMC both use the cc-pVTZ basis set, and the one-body Jastrow function uses orbitals from LDA calculations. AFQMC is pushed to complete the basis limit. All the comparison results are taken from ref. 25. c Structure of graphene. d the cohesive energy per atom of Γ point and finite-size error corrected result is plotted. Experiment cohesive energy is from ref. 29. Graphene is calculated at its equilibrium length 1.421 Å. e Structure of rock-salt lithium hydride crystal. f Equation of state of LiH crystal is plotted, fitted Birch–Murnaghan parameters and experimental data are also given. HF corrections are calculated using cc-pVDZ basis, and ${E}_{\infty }^{{{{{{{{\rm{HF}}}}}}}}}$ is approximated by ${E}_{{{{{{{{\rm{N=8}}}}}}}}}^{{{{{{{{\rm{HF}}}}}}}}}$. The arrows denote the corresponding HF corrections. g Plot of homogeneous electron gas system. h Correlation error of 54-electrons HEG systems at different rs. Correlation error is defined as [1 − (E − EHF)/(Eref − EHF)] × 100%, and EHF is taken from ref. 33. DCD, BF-VMC, and TC-FCIQMC results are displayed for comparison, and BF-DMC data were used as reference33,34.

Full size image

Graphene

Graphene is arguably the most famous two-dimensional system (Fig. 2c) receiving broad attention in the past two decades for its mechanical, electronic, and chemical applications26. Here we carry out simulations to estimate its cohesive energy, which measures the strength of C-C chemical bonding and long-range dispersion interactions. The calculations are performed on a 2 × 2 supercell of graphene using twist average boundary condition (TABC)27 in conjunction with structure factor S(k) correction28 (see Supplementary Fig. 3) to reduce the finite-size error. The calculated results are plotted in Fig. 2d along with the experimental value29, and it shows that our neural network can deal with graphene very well, producing a cohesive energy of graphene within 0.1 eV/atom to the experimental reference (see Supplementary Table 6). We also plotted the results with periodic boundary conditions (PBC), namely the Γ point-only result, which deviates from the experiment data by 1.25 eV/atom.

Lithium hydride crystal

For a three-dimensional system, we consider the LiH crystal with a rock-salt structure (Fig. 2e), another benchmark system for accurate ab initio methods6,30,31. Despite consisting of only simple elements, LiH represents typical ionic and covalent bonds upon changing the lattice constants. Using our neural network, we first simulate the equation of the state of LiH on a 2 × 2 × 2 supercell, as shown in Fig. 2f. In addition, we employ a standard finite-size correction based on Hartree–Fock calculations of a large supercell (see Supplementary Fig. 5). In Fig. 2f we also show the Birch–Murnaghan fitting to the equation of state, based on which we can obtain thermodynamic quantities such as the cohesive energy, the bulk modulus, and the equilibrium lattice constant of LiH. As shown in the inset, our results on the thermodynamic quantities agree very well with experimental data30 (see Supplementary Table 8, 9).

For further validation, we have also computed directly the 3 × 3 × 3 supercell of LiH at its equilibrium length of 4.061 Å, which contains 108 electrons. To the best of our knowledge, this is the largest electronic system computed using a high-quality neural network ansatz. The 3 × 3 × 3 supercell calculation predicts the total energy per unit cell of LiH is −8.160 Hartree and the cohesive energy per unit cell is −4.770 eV after the finite-size correction (see Supplementary Table 10), which is also very close to the experimental value −4.759 eV30.

Homogeneous electron gas

In addition to the solids containing nuclei, our computational framework can also apply straightforwardly to model systems such as homogeneous electron gas (HEG). HEG has been studied for a long time to understand the fundamental behavior of metals and electronic phase transitions32. Several seminal ab initio works have reported the energy of HEG at different densities21,22,32,33,34,35. Recently two other works have extended neural network ansatz to study HEG21,22. Although our computational framework is independently designed for solids, the network structure between this work and refs. 21, 22 employ similar ideas. Different physics-inspired envelope functions and periodic features are used in these works, which suit the features of solids and homogeneous electron gas respectively. We make comparisons between these networks and ours on HEG, and observe consistent performance, which further proves the effectiveness of neural network-based QMC works. In this section, we present the results calculated on a simple cubic cell containing 54 electrons in a closed-shell configuration, the largest HEG system studied in this work (Fig. 2g). More results and comparisons with other works on smaller systems are discussed in the section “Network comparison” and Supplementary Table 13.

Figure 2h shows our calculated correlation error on the 54-electrons HEG at six different densities from rs = 0.5 Bohr to 20.0 Bohr. The state-of-the-art results, namely VMC with backflow correlation (BF)33, distinguishable cluster with double excitations (DCD)34, and transcorrelated full configuration interaction quantum Monte Carlo (TC-FCIQMC)35 are also plotted for comparison, and BF-DMC result is often used as the reference energy of correlation error. Overall, our neural network performs very well, with an error of less than 1% in a wide range of density (see Supplementary Table 14). Generally, the correlation error increases as the density of HEG decreases when the correlation effects become larger, which also appears in DCD calculations.

Electron density

Besides the total energy of solid systems, the electron density is also a key quantity to be calculated. For example, the electron density is crucial for characterizing different solids, such as the difference between insulators and conductors, and the distinct nature of ionic and covalent crystals. In DFT the one-to-one correspondence between many-body wavefunction and electron density proved by Hohenberg and Kohn in 1964 suggests that electron density is a fundamental quantity of materials. However, an interesting survey found that while new functionals in DFT improve the energy calculation the obtained density somehow can deviate from the exact36. Here, with our accurate neural network wavefunction, we can also obtain accurate electron density of solids and provide a valuable benchmark and guidance for method development.

A conductor features free-moving electrons, which would have macroscopic movements under external electric fields. In contrast, electrons are localized and constrained in insulators and cause considerable electron resistance. In Fig. 3, as an example, we show the calculated electron density of the hydrogen chain at two different bond lengths. As we can see, for the compressed hydrogen chain (L = 2 Bohr), the electron density is rather uniform and has small fluctuations. As the chain is stretched, the electrons are more localized and the density profile has larger variations. The observation is consistent with the well-known insulator-conductor transition on the hydrogen chain by varying the bond length. To measure the transition more quantitatively, we further calculate the complex polarization Z as the order parameter for insulator-conductor transition37. A conducting state is characterized by a vanishing complex polarization modulus ∣Z∣ ~0, while an insulating state has a finite ∣Z∣ ~1. We can see that the insulator-conductor transition bond length of the hydrogen chain is around 3 Bohr according to the calculated results, which is also consistent with the previous studies37.

Fig. 3: Electron density of H10 chains.

The horizontal axis is scaled by the corresponding bond length. Complex polarization modulus ∣Z∣ as a function of bond length is plotted in the inset.

Full size image

Ionic and covalent bonds are the most fundamental chemical bonds in solids. While the physical pictures of these two types of bonding are very different, they both lie in the behavior of electrons as the “quantum glue” and electron density distribution is a simple way to visualize different bonding types. Here we choose to calculate the electron density of the diamond-structured Si, rock-salt NaCl and LiH crystals at their equilibrium position. They are representative of covalent and ionic crystals, and have also been investigated by other high-level wavefunction methods, e.g., AFQMC38. Note that in the calculations of NaCl and Si, correlation-consistent effective core potential (ccECP) is employed to reduce the cost, which removes the inertia of core electrons and keeps the behavior of active valence electrons15,39.

The electron density of diamond-structured Si in its $(01\bar{1})$ plane is plotted in Fig. 4b. We can see that valence electrons are shared by the nearest Si atoms, forming apparent Si-Si covalent bonds. In contrast, valence electrons are located around atoms in NaCl crystal as Fig. 4c shows. All the valence electrons are attracted around Cl atoms, forming effective Na+ and Cl− ions in the crystal. Moreover, the electron density of LiH crystal is also calculated and plotted in Fig. 4d. LiH crystal is a moderate system between a typical ionic and covalent crystal. According to the result, the electrons are nearly equally distributed near Li and H atoms for our network. Detailed Bader charge analysis40 manifests the atoms in the crystal become Li0.67+ and H0.67− ions, respectively (resolution ~0.015 Å), which slightly deviates from the stable closed-shell configuration (see Supplementary Note 7 for more details).

Fig. 4: Electron density of solids.

a Structures of solids, where the lattice planes for plotting electron densities are indicated. b Electron density of diamond-structured Si in its ($01\bar{1}$) plane, ccECP[Ne] is employed, and the bond length of Si equals 5.42 Å. c Electron density of NaCl crystal in its xy-plane, ccECP[Ne] is employed, and the bond length of NaCl equals 5.7 Å. d, the electron density of LiH crystal in its xy-plane, and the bond length of LiH equals 4.0 Å.

Full size image

Network comparison

In refs. 21, 22, neural networks are also used to simulate homogeneous electron gas system, employing a different choice of periodic feature function. In Fig. 5 we plot the correlation error computed on the 14-electrons HEG system, which can be compared with the results of other works. We can see that all three networks can go beyond the BF-DMC level for high-density systems. For all systems tested, our correlation errors are about 2% with the TC-FCIQMC result as the reference35, whereas the results of refs. 21, 22 are within 1%. It is understandable that the networks of refs. 21, 22 are specially designed for HEG systems, so slightly better accuracy can be achieved than our network. In their works, multiple phase factors eik⋅r are used in the constructed orbitals, which improve the expressiveness of the network. In comparison, our network contains an additional exponential decay term, which simulates the attraction between atoms and electrons in solids containing nuclei (see Methods section for more details). Furthermore, the choice of periodic distance, as well as the domains of the constructed wavefunction (complex or real-valued), are also different in these three works, which may add differences to their performance. In the future, it would be interesting to combine the insights learned from these three works and design a better network ansatz for periodic systems.

Fig. 5: Correlation error of 14-electrons HEG system at different rs.

Correlation error is defined as [1 − (E − EHF)/(Eref − EHF)] × 100%. WAP-Net refers to ref. 21 and FermiNet-HEG refers to ref. 22. BF-DMC results33,34 are displayed for comparison, and TC-FCIQMC data were used as reference35.

Full size image

Metallic lithium

We have also carried out preliminary calculations on metallic lithium. The real metal system remains a notoriously difficult task for accurate wavefunction approaches7,41,42,43,44. The zero gap of metal leads to a discontinuity in the Brillouin zone integral. As a consequence, a significantly larger simulation cell is required for metals than insulators to reach the thermodynamic limit. Shortcut approaches to simulate metals are proposed via employing a special twist angle7,43, which helps to reduce the simulation size and finite-size error. Here we employ our network to simulate lithium with a body-centered cubic (bcc) structure, which is a typical metal with zero gap. A 2 × 2 × 2 conventional cell of bcc-Li at Γ point is employed (see Supplementary Table 11). In Supplementary Table 12, we list the total energy and the cohesive energy computed. As expected, the error in cohesive energy of lithium with such a limited supercell is larger than in non-metallic solids such as LiH, and further developments are desired to treat the large finite-size errors in metal.