Self‑Generating Novel Gallium‑Based Materials via a Bayesian Optimization Framework Achieving 100% Uniqueness
A collaborative team from Flinders University and Khalifa University introduced a machine‑learning‑guided Bayesian optimization workflow that autonomously designs chemically valid gallium‑containing compounds with tunable band gaps (0.5–3.5 eV), achieving 100 % uniqueness and high SMACT validity, and validated the predictions with KNN modeling, SHAP analysis, and DFT calculations.
In modern semiconductor industries, precise control of electronic structure—especially band‑gap engineering—is essential for applications ranging from photovoltaics to high‑frequency communication and quantum information systems, yet traditional materials discovery struggles to meet this demand.
The research team, led by Flinders University in collaboration with Khalifa University, proposed a machine‑learning‑guided Bayesian optimization (BO) framework that respects chemical feasibility while enabling inverse design of gallium‑based compounds with target electronic properties.
Dataset: Constructing a Chemical Learning Space from Real Materials Databases
The study leveraged the NOMAD and Materials Project databases, extracting composition and experimental band‑gap values for compounds such as Ga₄P₄, GaAs, GaN, and Ga₂O₃. After removing entries with missing or non‑physical band gaps and deduplicating, 1,578 valid compositions remained. Features were engineered, including element counts, formula length, and a binary indicator for gallium presence. The cleaned dataset spans band gaps from 0.0 to 5.92 eV (mean ≈1.8 eV, σ ≈1.6 eV) and was split 80/20 with composition‑level stratification and five‑fold cross‑validation.
Framework: Co‑Design of Machine Learning and Bayesian Optimization
Prediction Model Layer
Eight regression algorithms—linear models, SVR, random forest, gradient boosting, and K‑nearest neighbors (KNN) among others—were systematically evaluated. Non‑linear models outperformed linear ones, indicating strong non‑linearity between composition and band gap. KNN achieved the highest performance (R² = 0.812) and superior error metrics, leading to its selection as the surrogate model for BO due to its excellent local interpolation capability and stability across random splits.
Bayesian Optimization Module
The BO workflow employs the KNN surrogate to guide the search for gallium‑containing compositions with desired band gaps, using an Expected Improvement acquisition function to balance exploration and exploitation. Constraints limit each candidate to at most four elements and enforce a minimum gallium fraction, ensuring relevance to the gallium‑based research focus.
Chemical Constraint Filtering Layer
All generated candidates are screened with SMACT, enforcing charge balance, plausible oxidation states, and consistent electronegativity. This guarantees that proposed materials are chemically realizable, not merely mathematically valid.
The framework also integrates explainable AI via SHAP analysis, revealing that melting point, electronegativity range, and electronegativity deviation are the most influential features for band‑gap prediction, aligning with established semiconductor physics.
Accelerating Inverse Materials Design under Real Chemical Constraints
Model Performance Evaluation
Cross‑validation of the KNN model yields an average R² of 0.60 ± 0.07 and RMSE of ~1.02 eV, indicating good generalization in sparse chemical spaces.
Feature‑importance plots (see image below) confirm the dominant role of melting point, electronegativity range, and electronegativity deviation.
Learning Real Chemical Rules
During the BO search, 1,025 candidate gallium compositions were proposed, of which only 38 passed SMACT filtering, demonstrating the strictness of chemical feasibility constraints. These viable candidates cluster in the 2.0–2.5 eV band‑gap region, matching known trends for medium‑gap semiconductors such as Ga₂O₃ (≈4.8 eV) and Ga₂S₃ (≈2.5 eV).
The algorithm preferentially explores known gallium families (Ga–O, Ga–N, Ga–As/Sb) while suggesting new intermediate stoichiometries, e.g., Ga₀.₅₁As₀.₁₆N₀.₂₄Sb₀.₁₀ and Ga₀.₁₇₁Sb₀.₁₇₅O₀.₃₆₇F₀.₂₈₆.
For wide‑gap materials (>3.0 eV), oxygen‑rich compounds are favored; for lower gaps (≈1.5–2.0 eV), sulfur, selenium, or phosphorus substitution reduces the gap, reflecting learned chemical rules consistent with experimental observations.
Capturing Structure‑Property Relationships
Using the Chemeleon‑dng model (Park et al.), crystal prototypes of the SMACT‑validated candidates were predicted, revealing predominantly tetrahedral and octahedral gallium coordination, consistent with known structures of Ga₂O₃, GaN, and GaSe.
The observed hierarchy—oxide band gaps > chalcogenide band gaps > nitride band gaps—mirrors established semiconductor trends.
DFT Validation
Density‑functional theory calculations were performed on ten SMACT‑validated compounds. The comparison of model‑predicted versus DFT‑computed band gaps yields an average absolute error of 0.890 eV, RMSE of 1.158 eV, and median absolute error of 0.784 eV, indicating reasonable accuracy for early‑stage screening.
Conclusion
The study demonstrates a new materials‑design paradigm for gallium‑based semiconductors: a synergistic workflow that combines machine‑learning modeling, Bayesian‑optimization search, and chemically constrained filtering to automatically generate novel, chemically valid compounds with targeted band gaps. Beyond gallium systems, the methodology is extensible to indium, tin, and lead‑free semiconductor families, marking a shift from empirical trial‑and‑error toward algorithmic generation in materials science.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
HyperAI Super Neural
Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
