Abstract:
Objective Light field super-resolution (LFSR) aims to reconstruct high-resolution light field images from low-resolution observations while preserving both fine spatial details and angular consistency among multiple views. Because spatial and angular information are tightly coupled in light field data, reconstruction is challenged by both feature complexity and view dependency. Existing approaches often suffer from two major limitations. First, high-frequency textures, edge details, and subtle structural patterns are easily degraded during feature extraction and upsampling, resulting in blurred outputs. Second, insufficient modeling of spatial-angular correlations may introduce inconsistencies across viewpoints, impairing geometric fidelity and visual coherence. To address these issues, this paper proposes a light field super-resolution framework that integrates Mamba-based subspace scanning with diffusion-based generative reconstruction. The framework is designed to enhance high-frequency detail recovery, strengthen long-range spatial-angular dependency modeling, and improve reconstruction accuracy and cross-view consistency under different upscaling settings.
Methods The proposed framework adopts a dual-branch subspace scanning strategy based on the Mamba architecture. Considering that light field images exhibit complementary characteristics in different subspaces, two specialized branches are constructed for efficient modeling. The first branch, termed EPI-Mamba, focuses on Epipolar Plane Image (EPI) structures, which explicitly characterize geometric relationships across viewpoints. This branch is therefore used to capture directional continuity and structural variation in epipolar dimensions. The second branch, termed Spatial-Angular Mamba (SA-Mamba), is designed to model correlations between spatial content and angular variation, enabling the network to learn dependencies that are difficult to represent using conventional convolution alone. Both branches perform bidirectional scanning in their respective subspaces, allowing efficient long-range dependency modeling while maintaining relatively low computational complexity.
The serialized features extracted from the two branches are then fed into a Multi-scale Cross Interaction (MCI) module. This module promotes deep information exchange between EPI-aware features and spatial-angular features at multiple scales, thereby enhancing complementary fusion of geometric and texture information. To further refine the fused representations, a Spatial-Angular Modulation (SAM) module is introduced. This module jointly calibrates features from the spatial and angular perspectives, adaptively emphasizing informative responses and suppressing inconsistent activations. As a result, cross-view feature alignment is improved and the coherence of reconstructed light field content is strengthened.
To mitigate the loss of high-frequency details, a frequency-domain enhancement mechanism is incorporated into the framework. Specifically, Fast Fourier Transform (FFT) is applied to the fused features to obtain frequency-domain representations, from which informative high-frequency components are selectively enhanced. This process compensates for detail attenuation, reinforces discriminative structural responses, and suppresses irrelevant or noisy signals. The enhanced features are then input into a diffusion-based denoising network. Benefiting from the strong generative capability of diffusion models, the network progressively restores fine details through iterative denoising and reconstructs high-resolution light field images after upsampling. The cooperation between subspace-aware feature extraction and diffusion-based refinement enables the framework to balance reconstruction fidelity, structural accuracy, and perceptual quality.
Results and Discussions Extensive experiments are conducted on multiple benchmark datasets to evaluate the proposed method. Quantitative comparisons show that the proposed framework consistently outperforms representative state-of-the-art methods under different magnification settings. In the 2× light field super-resolution task, the proposed method achieves a peak signal-to-noise ratio (PSNR) of 39.43 dB and a structural similarity index (SSIM) of 0.987, representing the best performance among the compared methods. In the more challenging 4× task, the proposed approach still attains the highest results, reaching 33.70 dB in PSNR and 0.945 in SSIM. In particular, the PSNR is improved by up to 1.44 dB over existing methods, demonstrating a clear quantitative advantage.
Qualitative results further confirm the superiority of the proposed framework. Compared with competing approaches, the reconstructed images exhibit sharper boundaries, clearer local textures, and more faithful structural recovery, especially in regions containing dense lines, repetitive patterns, or complex high-frequency details. In addition, the restored views show stronger cross-view consistency, with fewer artifacts such as blurring or misalignment. These observations indicate that the method not only improves distortion-based metrics but also enhances perceptual quality and geometric coherence, both of which are essential for light field imaging applications.
The performance gains can be explained from several aspects. First, the dual-branch Mamba subspace scanning strategy makes full use of the intrinsic properties of light field data by separately modeling EPI structures and spatial-angular dependencies. Second, the MCI and SAM modules strengthen feature interaction, adaptive fusion, and cross-view calibration, thereby improving both discriminative ability and reconstruction stability. Third, the frequency-domain enhancement mechanism directly compensates for high-frequency information loss, which is especially beneficial for recovering textures and edge details. Finally, the diffusion-based denoising network further refines the reconstructed results by exploiting generative priors, leading to more realistic and visually pleasing outputs. Together, these modules form a unified framework in which each component contributes to performance from a complementary perspective.
Conclusions This paper presents a light field super-resolution framework that combines Mamba-based subspace scanning with diffusion-based generative reconstruction. By jointly capturing EPI structures and spatial-angular correlations, and by integrating multi-scale cross interaction, spatial-angular modulation, and frequency-domain enhancement, the proposed method effectively addresses two core challenges in light field super-resolution: high-frequency detail loss and cross-view inconsistency. Experimental results demonstrate that the method achieves state-of-the-art performance in both objective metrics and subjective visual quality, with notable advantages in texture recovery, edge preservation, and structural consistency. These findings indicate that the proposed framework provides an effective solution for high-quality light field super-resolution. Future work may explore more lightweight architectures and more efficient diffusion strategies to reduce computational cost and extend the framework to related applications such as depth estimation, view synthesis, and light field restoration.