• 摘要: 宽视场光场成像能以更大视野反映场景中光线方向与强度信息,其巨大的数据量也对编码压缩技术提出更高要求。针对宽视场光场图像数据量大、目标码率可变及内容不确定的编码挑战,提出一种目标码率与图像内容感知的编码方法。首先,构建稀疏采样模式集,基于光场图像复杂度特征与目标码率约束,通过K近邻算法自适应优选最佳稀疏采样模式,降低选择复杂度;其次,以关键视图帧间距离为依据设计帧级码率分配策略,提升角度维质量一致性;最后,结合场景显著性、深度与区域复杂度,提出LCU级内容感知码率分配策略,实现细粒度资源优化。用户端通过深度学习网络重建非关键视图,获得完整光场图像。实验结果表明,所提方法能根据目标码率与光场图像内容快速地获得优选的稀疏采样模式,显著降低稀疏采样模式选择的复杂度,提升在网络传输条件变化下的宽视场光场图像编码性能;对比现有方法,平均实现45.284%码率节省与1.641 dB峰值信噪比提升,且角度维一致性得到增强,随机访问惩罚最高仅20.29%,表现出良好的综合编码性能。

       

      Abstract:
      Objective A sparse sampling mode set including key views and non-key views is constructed via the core strategy of sparse coding and reconstruction, providing fundamental support for the coding of wide-FOV light field images with variable bitrates. The specific implementation scheme is as follows: Firstly, the multi-dimensional complexity features of light field images are extracted, and an adaptive selection scheme for sparse sampling modes is designed combined with target bitrate constraints. Leveraging the K-nearest neighbor (KNN) algorithm, rapid matching of the optimal mode is enabled, allowing identification of the best sparse sampling strategy without traversing the entire mode set. On this basis, predictive relationships among key views are established, providing a more robust basis for subsequent coding and compression processes. Secondly, to address uneven view quality caused by traditional bitrate allocation methods, a frame-level bitrate pre-allocation strategy is designed according to the inter-frame spatial distance between the current encoded frame and its reference frames within the key view set. Dynamic adjustment of bitrate allocation weights effectively improves the angular quality consistency of the encoded views. Then, based on the I-frames in the key view set, the saliency detection and depth estimation techniques are integrated to generate largest coding unit (LCU)-level saliency weights. Utilizing the angular correlation of light field images, these weights are propagated to all key views along the angular dimension. Furthermore, saliency weights are combined with image regional complexity to develop a fine-grained LCU-level content-aware bitrate allocation strategy, achieving more accurate bitrate allocation. Finally, at the client side, key views are decoded according to the transmitted optimal sparse sampling mode, and non-key views are reconstructed using the decoded key views with a deep learning-based view synthesis network, ultimately completing the reconstruction of the wide-FOV light field image.
      Methods In this paper, the core strategy of sparse coding and reconstruction is adopted to construct a sparse sampling mode set that includes key views and non-key views, thereby providing fundamental support for the coding of wide-FOV light field images with variable bitrates. The specific implementation scheme is as follows: Firstly, the multi-dimensional complexity features of light field images are extracted and an adaptive selection scheme is designed for sparse sampling modes in combination with target bitrate constraints. Leveraging the K-nearest neighbor (KNN) algorithm, this scheme enables the rapid matching of the optimal mode, which allows the identification of the best sparse sampling strategy without traversing the entire mode set. On this basis, the predictive relationships among key views are established, providing a more robust basis for subsequent coding and compression processes. Secondly, aiming to address the issue of uneven view quality caused by traditional bitrate allocation methods, a frame-level bitrate pre-allocation strategy is designed according to the inter-frame spatial distance between the current encoded frame and its reference frames within the key view set. By dynamically adjusting the bitrate allocation weights, this strategy effectively improves the angular quality consistency of the encoded views. Then, based on the I-frames in the key view set, the saliency detection and depth estimation techniques are integrated to generate LCU (largest coding unit)-level saliency weights. Utilizing the angular correlation of light field images, these weights are propagated to all key views along the angular dimension. Furthermore, the saliency weights are combined with the regional complexity of images to develop a fine-grained LCU-level content-aware bitrate allocation strategy, achieving more accurate bitrate allocation. Finally, at the client side, the key views are decoded according to the transmitted optimal sparse sampling mode, and the non-key views are reconstructed using the decoded key views with a deep learning-based view synthesis network, thus ultimately completing the reconstruction of the wide-FOV light field image.
      Results and Discussions The proposed adaptive sparse sampling mode selection scheme, leveraging multi-dimensional feature fusion and deep learning prediction mechanisms, significantly reduces the computational complexity of mode selection and avoids exorbitant computational costs incurred by the traditional brute force search method. It rapidly matches diverse target bitrates and image content features to select the optimal or sub-optimal sparse coding mode. The designed two-level bitrate allocation strategy, combining frame-level pre-allocation and LCU-level perceptual allocation, effectively mitigates uneven angular view quality while maintaining compatibility with existing video coding standards, further enhancing the angular quality consistency of wide-FOV light field images. Experimental results show that, compared with existing representative methods, the method achieves an average bitrate saving of 45.284% and a peak signal-to-noise ratio (PSNR) improvement of 1.641 dB, with the maximum random access penalty of light field images limited to only 20.29%. These results indicate that the method not only improves coding efficiency but also ensures favorable random access performance, satisfying the demand for rapid view retrieval in practical applications.
      Conclusions The proposed wide-FOV light field image coding method based on target bitrate and image content awareness effectively adapts to the coding requirements of variable bitrates and uncertain image content. It achieves satisfactory results in reducing sampling mode selection complexity and improving network transmission adaptability. Meanwhile, it fulfills multiple objectives including bitrate saving, image quality enhancement, angular quality consistency improvement, and random access capability optimization, demonstrating superior overall coding performance and favorable practical application value.