• 摘要: 针对光场超分辨中高频细节易丢失与跨视角一致性难保持的问题,本文提出一种融合Mamba子空间扫描与扩散生成的重建框架。采用EPI-Mamba与SA-Mamba两个子空间的双向扫描,分别高效捕获光场的EPI结构与空间-角度相关性;随后将两路扫描得到的序列化特征输入多尺度交互融合模块(MCI)实现深层信息互补与耦合,经过空间–角度调制模块(SAM)对融合结果进行空间–角度双重标定,在此基础上引入频域增强机制,利用FFT特征对融合结果进行高频补偿,以强化特征关系并抑制无关或噪声信息。将增强后的特征输入到扩散模型去噪网络中,上采样之后得到超分辨率结果。实验结果表明,本方法在多个定量指标和视觉评估中均表现优越,在2×任务中与其他方法相比获得了最高的分数39.43/0.987,在更复杂的4×任务中也获得了最高的分数33.70/0.945,PSNR比现有方法提升了1.44 dB。在定性视觉效果中,细节保持、图像清晰度等取得了显著效果,尤其在高频纹理与结构保持方面展现出了明显优势。

       

      Abstract:
      Objective Light field super-resolution (LFSR) aims to reconstruct high-resolution light field images from low-resolution observations while preserving both fine spatial details and angular consistency among multiple views. Because spatial and angular information are tightly coupled in light field data, reconstruction is challenged by both feature complexity and view dependency. Existing approaches often suffer from two major limitations. First, high-frequency textures, edge details, and subtle structural patterns are easily degraded during feature extraction and upsampling, resulting in blurred outputs. Second, insufficient modeling of spatial-angular correlations may introduce inconsistencies across viewpoints, impairing geometric fidelity and visual coherence. To address these issues, this paper proposes a light field super-resolution framework that integrates Mamba-based subspace scanning with diffusion-based generative reconstruction. The framework is designed to enhance high-frequency detail recovery, strengthen long-range spatial-angular dependency modeling, and improve reconstruction accuracy and cross-view consistency under different upscaling settings.
      Methods The proposed framework adopts a dual-branch subspace scanning strategy based on the Mamba architecture. Considering that light field images exhibit complementary characteristics in different subspaces, two specialized branches are constructed for efficient modeling. The first branch, termed EPI-Mamba, focuses on Epipolar Plane Image (EPI) structures, which explicitly characterize geometric relationships across viewpoints. This branch is therefore used to capture directional continuity and structural variation in epipolar dimensions. The second branch, termed Spatial-Angular Mamba (SA-Mamba), is designed to model correlations between spatial content and angular variation, enabling the network to learn dependencies that are difficult to represent using conventional convolution alone. Both branches perform bidirectional scanning in their respective subspaces, allowing efficient long-range dependency modeling while maintaining relatively low computational complexity.
      The serialized features extracted from the two branches are then fed into a Multi-scale Cross Interaction (MCI) module. This module promotes deep information exchange between EPI-aware features and spatial-angular features at multiple scales, thereby enhancing complementary fusion of geometric and texture information. To further refine the fused representations, a Spatial-Angular Modulation (SAM) module is introduced. This module jointly calibrates features from the spatial and angular perspectives, adaptively emphasizing informative responses and suppressing inconsistent activations. As a result, cross-view feature alignment is improved and the coherence of reconstructed light field content is strengthened.
      To mitigate the loss of high-frequency details, a frequency-domain enhancement mechanism is incorporated into the framework. Specifically, Fast Fourier Transform (FFT) is applied to the fused features to obtain frequency-domain representations, from which informative high-frequency components are selectively enhanced. This process compensates for detail attenuation, reinforces discriminative structural responses, and suppresses irrelevant or noisy signals. The enhanced features are then input into a diffusion-based denoising network. Benefiting from the strong generative capability of diffusion models, the network progressively restores fine details through iterative denoising and reconstructs high-resolution light field images after upsampling. The cooperation between subspace-aware feature extraction and diffusion-based refinement enables the framework to balance reconstruction fidelity, structural accuracy, and perceptual quality.
      Results and Discussions Extensive experiments are conducted on multiple benchmark datasets to evaluate the proposed method. Quantitative comparisons show that the proposed framework consistently outperforms representative state-of-the-art methods under different magnification settings. In the 2× light field super-resolution task, the proposed method achieves a peak signal-to-noise ratio (PSNR) of 39.43 dB and a structural similarity index (SSIM) of 0.987, representing the best performance among the compared methods. In the more challenging 4× task, the proposed approach still attains the highest results, reaching 33.70 dB in PSNR and 0.945 in SSIM. In particular, the PSNR is improved by up to 1.44 dB over existing methods, demonstrating a clear quantitative advantage.
      Qualitative results further confirm the superiority of the proposed framework. Compared with competing approaches, the reconstructed images exhibit sharper boundaries, clearer local textures, and more faithful structural recovery, especially in regions containing dense lines, repetitive patterns, or complex high-frequency details. In addition, the restored views show stronger cross-view consistency, with fewer artifacts such as blurring or misalignment. These observations indicate that the method not only improves distortion-based metrics but also enhances perceptual quality and geometric coherence, both of which are essential for light field imaging applications.
      The performance gains can be explained from several aspects. First, the dual-branch Mamba subspace scanning strategy makes full use of the intrinsic properties of light field data by separately modeling EPI structures and spatial-angular dependencies. Second, the MCI and SAM modules strengthen feature interaction, adaptive fusion, and cross-view calibration, thereby improving both discriminative ability and reconstruction stability. Third, the frequency-domain enhancement mechanism directly compensates for high-frequency information loss, which is especially beneficial for recovering textures and edge details. Finally, the diffusion-based denoising network further refines the reconstructed results by exploiting generative priors, leading to more realistic and visually pleasing outputs. Together, these modules form a unified framework in which each component contributes to performance from a complementary perspective.
      Conclusions This paper presents a light field super-resolution framework that combines Mamba-based subspace scanning with diffusion-based generative reconstruction. By jointly capturing EPI structures and spatial-angular correlations, and by integrating multi-scale cross interaction, spatial-angular modulation, and frequency-domain enhancement, the proposed method effectively addresses two core challenges in light field super-resolution: high-frequency detail loss and cross-view inconsistency. Experimental results demonstrate that the method achieves state-of-the-art performance in both objective metrics and subjective visual quality, with notable advantages in texture recovery, edge preservation, and structural consistency. These findings indicate that the proposed framework provides an effective solution for high-quality light field super-resolution. Future work may explore more lightweight architectures and more efficient diffusion strategies to reduce computational cost and extend the framework to related applications such as depth estimation, view synthesis, and light field restoration.