• 摘要: 针对传统干物质含量(DMC)检测方法效率低、破坏样本的问题,提出一种基于高光谱成像技术与机器学习的青豆DMC无损检测方法。通过改进主成分分析(PCA)算法,结合各主成分对DMC的解释力与累计方差贡献率共同筛选特征波段,并通过载荷系数定位关键波段,实现高效的特征选择。以XGBoost模型为核心预测框架,引入随机搜索与KFold交叉验证优化超参数,并对预测值进行反标准化。研究提出的PCA-XGBoost-RK联合模型在不同批次的青豆样本中均表现出优异的预测性能,尤其在成熟初期阶段(决定系数R2为0.94,均方根误差为0.20,相对百分比差异为3.58),显著优于传统检测方法。模型通过可视化直接呈现预测结果,进一步增强数据的可解释性。实验结果表明,高光谱与机器学习方法在青豆品质检测中具有巨大的应用潜力,优化后的PCA与改进的XGBoost在降维与建模中表现突出。

       

      Abstract: Aiming at the problems of low efficiency and damage to samples by traditional dry matter content (DMC) detection methods, a non-destructive detection method for green bean DMC based on hyperspectral imaging technology and machine learning was proposed. By improving the principal component analysis (PCA) algorithm, characteristic spectral bands selected based on both the explanatory power of each principal component for DMC and the cumulative variance contribution rate, and the key bands are located through the load coefficient to achieve efficient feature selection. Taking the XGBoost model as the core prediction framework, random search and KFold cross-validation are introduced to optimise the hyperparameters, and the predicted values are de-normalised. The proposed PCA-XGBoost-RK combined model showed excellent prediction performance in different batches of green bean samples, especially in the early stage of maturity (the coefficient of determination R2 is 0.94, the root mean square error is 0.20, and the relative percentage difference is 3.58), which was significantly better than the traditional detection method. Visualisation of predictions further enhanced the interpretability of the results. Experimental results show that hyperspectral and machine learning methods have great application potential in green bean quality detection, and the optimised PCA and improved XGBoost perform well in dimensionality reduction and modelling.