面向复合噪声的结构光三维重建多模态去噪方法

张立平; 胡晓峰; 潘飞文; 郭斌; 罗哉

doi:10.12086/oee.2026.250272

面向复合噪声的结构光三维重建多模态去噪方法

Multi-modal denoising method for structured light 3D reconstruction under composite noise

摘要: 针对结构光三维成像中复合噪声干扰下表面细节难以准确还原的问题，提出一种融合多模态输入与结构保持机制的深度去噪方法。采用双通道残差卷积神经网络，以含噪条纹图与串联滤波图为联合输入，引入多尺度特征提取与通道注意力模块，结合均方误差、结构相似性指数和拉普拉斯边缘项的混合损失函数，实现噪声抑制与边缘保持协同优化。搭建双目结构光实验平台，对石膏像等三类样本结合多频相移与外差法进行训练验证。结果表明，去噪后图像域的峰值信噪比提升约7 dB，结构相似性指数提高至0.96，均方根误差降低约61.6%；在三维重建结果层面，点到面均方根误差降低约47.1%，点云密度提升43.6%。该方法能够在抑制多源噪声的同时有效保持结构与边缘特征，显著提升三维重建质量。

Abstract:

Objective Structured light 3D reconstruction is widely used in high-precision metrology and industrial inspection because of its non-contact and high-resolution characteristics. In practical industrial environments, however, captured fringe patterns are often corrupted by composite noise originating from sensor electronics, surface reflectance variations, and ambient interference. Such noise degrades phase quality, introduces unwrapping errors, and limits the achievable reconstruction accuracy. To address these issues, this paper proposes a multi-modal deep denoising method that integrates a dual-channel input, a squeeze-and-excitation channel attention mechanism, and a hybrid loss function combining MSE, SSIM, and Laplacian edge constraints. The objective is to effectively suppress multi-source noise while preserving fringe continuity and edge structures, thereby enhancing phase retrieval stability, reducing unwrapping failures, and ultimately improving both image-domain quality and 3D reconstruction accuracy under complex industrial conditions.

Methods The proposed method employs a dual-channel residual convolutional neural network based on the DnCNN architecture. The input is constructed by concatenating the original noisy fringe pattern with a serially filtered image along the channel dimension. The serial filtering module first applies extreme value detection and selective median filtering to remove impulse noise, followed by two-dimensional Gaussian smoothing to suppress high-frequency random noise. Multi-scale convolutional layers with batch normalization and ReLU activation are then introduced to extract hierarchical fringe features from the dual-channel input, capturing both fine textures and coarse structures while preserving edge information.

Subsequently, squeeze-and-excitation (SE) channel attention modules are embedded after convolutional layers in each residual block to adaptively emphasize informative features. Each SE module performs global average pooling to compress spatial information, followed by a bottleneck mapping with ReLU and Sigmoid activation to generate channel-wise weights, which are then used to recalibrate the feature maps. A hybrid loss function is designed to balance global noise suppression and local structure preservation, combining mean squared error for luminance fidelity, structural similarity index for contrast and texture consistency, and a Laplacian-based edge loss to maintain high-frequency details and boundary sharpness.

Experiments are conducted on a binocular structured light system using three-frequency four-step phase shifting and heterodyne phase unwrapping. The left camera coordinate system is defined as the world coordinate system, and 3D coordinates are obtained via linear triangulation with calibrated camera and projector parameters. Training and validation data are collected from three representative objects: standard step blocks (rule geometries with sharp edges), plaster busts (smooth curved surfaces), and printed circuit boards (high-reflectivity components and fine textures). Reference images are obtained through multi-frame averaging.

Results and Discussions Quantitative evaluations in the image domain show that the proposed method achieves a PSNR of 43.11 dB / SSIM of 0.9654 on the standard step block, 42.90 dB / 0.9627 on the plaster bust, and 42.69 dB / 0.9643 on the PCB. Compared with noisy fringe patterns, the average PSNR increases by approximately 7 dB, SSIM improves to about 0.96, and RMSE decreases by around 61.6%.Ablation studies confirm the contribution of each component: dual-channel input improves PSNR by approximately 1.05 dB over single-channel input; the SE attention module provides consistent gains in both PSNR and SSIM, especially in edge detail recovery; and the hybrid loss function achieves the best overall performance. Comparative experiments against traditional methods (NLM, BM3D) and learning-based methods (BM3D-Net, DIVA) demonstrate that the proposed method achieves the highest PSNR (42.90 dB), highest SSIM (0.9627), and lowest RMSE (0.0063) on the plaster bust dataset. Noise robustness analysis under varying Gaussian noise levels (σg = 5 to 20) shows that even at the strongest noise level (σg = 20), the proposed method maintains a PSNR of 27.6 dB and an SSIM of 0.8032, substantially outperforming the original DnCNN (25.6 dB, 0.7514) and noisy inputs (15.7 dB, 0.4057), thereby demonstrating a high noise tolerance.

In the phase domain, denoised fringe patterns produce wrapped and absolute phase maps with significantly fewer phase jumps and fringe discontinuities, particularly in edge regions and high-reflectivity areas. Phase gradient analysis shows that the denoised histogram is more concentrated near zero, with a marked reduction in high-gradient outliers. In the 3D reconstruction domain, the point-to-plane RMSE decreases by 47.1%, the 95% quantile error decreases by 42.4%, the maximum error decreases by 16.7%, and point cloud density increases by 43.6%. Connectivity analysis reveals that high-error regions transition from large clustered patches to sparse isolated spots, indicating effective error tail suppression and spatial homogenization without introducing new artifacts. Region-wise error analysis further shows that RMSE and 95% quantile error decrease in both flat and edge regions, while edge metrics such as boundary accuracy, boundary completeness, and normal angle errors all improve. Consequently, the proposed method preserves key structural and edge details while suppressing noise, confirming that no edge rounding or structural weakening occurs.

Conclusions This paper proposes a deep denoising method for structured light fringe patterns that integrates multi-modal input and structure-preserving mechanisms. Based on an improved DnCNN framework, the method employs a dual-channel input combining the original noisy fringe pattern with a serially filtered image, introduces a squeeze-and-excitation channel attention mechanism, and adopts a hybrid loss function. Experiments on standard step blocks, plaster busts, and printed circuit boards demonstrate significant improvements in PSNR, SSIM, and RMSE, while maintaining texture preservation and fringe continuity. In the 3D reconstruction domain, point cloud density, accuracy, and structural fidelity are substantially enhanced. The proposed method provides a reliable preprocessing strategy for high-precision industrial measurement and defect detection.

面向复合噪声的结构光三维重建多模态去噪方法

Multi-modal denoising method for structured light 3D reconstruction under composite noise

相关链接

目录

面向复合噪声的结构光三维重建多模态去噪方法

Multi-modal denoising method for structured light 3D reconstruction under composite noise

相关链接

目录

微信二维码