Abstract:
Base liquor of Baijiu contains a wide variety of trace components with complex interactions. How to predict its grade rapidly and simply was of great importance. To achieve rapid and accurate identification while accounting for interactions between multi-component substances and reducing computational complexity, this study developed an improved permutation combination population analysis (imPCPA) method for feature selection based on near-infrared spectroscopy data. This study established a near-infrared spectroscopy (NIRS) model to discriminate the quality grades of base liquors in Nongxiangxing Baijiu. The model was developed using 687 samples from four quality grades. A combined feature selection strategy was applied to optimize spectral wavelength selection. The initial screening applied interval partial least squares (iPLS) to remove uninformative variables from preprocessed spectra. The improved permutation combination population analysis (imPCPA) further optimized wave point selection within the retained spectral intervals. Finally, constructed an extreme gradient boosting (XGBoost) classification model for grade prediction. The final selection identified 32 characteristic wavelength points. Compared to the original algorithm, the improved method reduced computational time by 80%. The median prediction accuracy of the classification model reached 95.65% on the prediction set. The results demonstrate that this method addresses key limitations of conventional sensory evaluation, including strong subjectivity and poor reproducibility in liquor analysis. It effectively captures synergistic interactions among multi-component substances while maintaining interpretable feature selection. The approach provides a reliable reference for rapid grade assessment of Baijiu base liquors.