Abstract:
Objective A sparse sampling mode set including key views and non-key views is constructed via the core strategy of sparse coding and reconstruction, providing fundamental support for the coding of wide-FOV light field images with variable bitrates. The specific implementation scheme is as follows: Firstly, the multi-dimensional complexity features of light field images are extracted, and an adaptive selection scheme for sparse sampling modes is designed combined with target bitrate constraints. Leveraging the K-nearest neighbor (KNN) algorithm, rapid matching of the optimal mode is enabled, allowing identification of the best sparse sampling strategy without traversing the entire mode set. On this basis, predictive relationships among key views are established, providing a more robust basis for subsequent coding and compression processes. Secondly, to address uneven view quality caused by traditional bitrate allocation methods, a frame-level bitrate pre-allocation strategy is designed according to the inter-frame spatial distance between the current encoded frame and its reference frames within the key view set. Dynamic adjustment of bitrate allocation weights effectively improves the angular quality consistency of the encoded views. Then, based on the I-frames in the key view set, the saliency detection and depth estimation techniques are integrated to generate largest coding unit (LCU)-level saliency weights. Utilizing the angular correlation of light field images, these weights are propagated to all key views along the angular dimension. Furthermore, saliency weights are combined with image regional complexity to develop a fine-grained LCU-level content-aware bitrate allocation strategy, achieving more accurate bitrate allocation. Finally, at the client side, key views are decoded according to the transmitted optimal sparse sampling mode, and non-key views are reconstructed using the decoded key views with a deep learning-based view synthesis network, ultimately completing the reconstruction of the wide-FOV light field image.
Methods In this paper, the core strategy of sparse coding and reconstruction is adopted to construct a sparse sampling mode set that includes key views and non-key views, thereby providing fundamental support for the coding of wide-FOV light field images with variable bitrates. The specific implementation scheme is as follows: Firstly, the multi-dimensional complexity features of light field images are extracted and an adaptive selection scheme is designed for sparse sampling modes in combination with target bitrate constraints. Leveraging the K-nearest neighbor (KNN) algorithm, this scheme enables the rapid matching of the optimal mode, which allows the identification of the best sparse sampling strategy without traversing the entire mode set. On this basis, the predictive relationships among key views are established, providing a more robust basis for subsequent coding and compression processes. Secondly, aiming to address the issue of uneven view quality caused by traditional bitrate allocation methods, a frame-level bitrate pre-allocation strategy is designed according to the inter-frame spatial distance between the current encoded frame and its reference frames within the key view set. By dynamically adjusting the bitrate allocation weights, this strategy effectively improves the angular quality consistency of the encoded views. Then, based on the I-frames in the key view set, the saliency detection and depth estimation techniques are integrated to generate LCU (largest coding unit)-level saliency weights. Utilizing the angular correlation of light field images, these weights are propagated to all key views along the angular dimension. Furthermore, the saliency weights are combined with the regional complexity of images to develop a fine-grained LCU-level content-aware bitrate allocation strategy, achieving more accurate bitrate allocation. Finally, at the client side, the key views are decoded according to the transmitted optimal sparse sampling mode, and the non-key views are reconstructed using the decoded key views with a deep learning-based view synthesis network, thus ultimately completing the reconstruction of the wide-FOV light field image.
Results and Discussions The proposed adaptive sparse sampling mode selection scheme, leveraging multi-dimensional feature fusion and deep learning prediction mechanisms, significantly reduces the computational complexity of mode selection and avoids exorbitant computational costs incurred by the traditional brute force search method. It rapidly matches diverse target bitrates and image content features to select the optimal or sub-optimal sparse coding mode. The designed two-level bitrate allocation strategy, combining frame-level pre-allocation and LCU-level perceptual allocation, effectively mitigates uneven angular view quality while maintaining compatibility with existing video coding standards, further enhancing the angular quality consistency of wide-FOV light field images. Experimental results show that, compared with existing representative methods, the method achieves an average bitrate saving of 45.284% and a peak signal-to-noise ratio (PSNR) improvement of 1.641 dB, with the maximum random access penalty of light field images limited to only 20.29%. These results indicate that the method not only improves coding efficiency but also ensures favorable random access performance, satisfying the demand for rapid view retrieval in practical applications.
Conclusions The proposed wide-FOV light field image coding method based on target bitrate and image content awareness effectively adapts to the coding requirements of variable bitrates and uncertain image content. It achieves satisfactory results in reducing sampling mode selection complexity and improving network transmission adaptability. Meanwhile, it fulfills multiple objectives including bitrate saving, image quality enhancement, angular quality consistency improvement, and random access capability optimization, demonstrating superior overall coding performance and favorable practical application value.