Abstract:
Aiming at the problems of low efficiency and damage to samples by traditional dry matter content (DMC) detection methods, a non-destructive detection method for green bean DMC based on hyperspectral imaging technology and machine learning was proposed. By improving the principal component analysis (PCA) algorithm, characteristic spectral bands selected based on both the explanatory power of each principal component for DMC and the cumulative variance contribution rate, and the key bands are located through the load coefficient to achieve efficient feature selection. Taking the XGBoost model as the core prediction framework, random search and KFold cross-validation are introduced to optimise the hyperparameters, and the predicted values are de-normalised. The proposed PCA-XGBoost-RK combined model showed excellent prediction performance in different batches of green bean samples, especially in the early stage of maturity (the coefficient of determination
R2 is 0.94, the root mean square error is 0.20, and the relative percentage difference is 3.58), which was significantly better than the traditional detection method. Visualisation of predictions further enhanced the interpretability of the results. Experimental results show that hyperspectral and machine learning methods have great application potential in green bean quality detection, and the optimised PCA and improved XGBoost perform well in dimensionality reduction and modelling.