Citation: | Liu XY, Zhang JC, Leng BR et al. Edge enhanced depth perception with binocular meta-lens. Opto-Electron Sci 3, 230033 (2024). doi: 10.29026/oes.2024.230033 |
[1] | Greenwold S. Spatial computing (Massachusetts Institute of Technology, Cambridge, 2003). |
[2] | Pangilinan E, Lukas S, Mohan V. Creating Augmented and Virtual Realities: Theory and Practice for Next-Generation Spatial Computing (O'Reilly Media, Inc. , Sebastopol, 2019). |
[3] | Liu XY, Chen MK, Chu CH et al. Underwater binocular meta-lens. ACS Photonics 10, 2382–2389 (2023). doi: 10.1021/acsphotonics.2c01667 |
[4] | Chen MK, Chu CH, Liu XY et al. Meta-lens in the sky. IEEE Access 10, 46552–46557 (2022). doi: 10.1109/ACCESS.2022.3171351 |
[5] | Jeon D, Shin K, Moon SW et al. Recent advancements of metalenses for functional imaging. Nano Convergence 10, 24 (2023). doi: 10.1186/s40580-023-00372-8 |
[6] | Li T, Chen C, Xiao XJ et al. Revolutionary meta-imaging: from superlens to metalens. Photon Insights 2, R01 (2023). doi: 10.3788/PI.2023.R01 |
[7] | Moon SW, Lee C, Yang Y et al. Tutorial on metalenses for advanced flat optics: design, fabrication, and critical considerations. J Appl Phys 131, 091101 (2022). doi: 10.1063/5.0078804 |
[8] | Pu MB, Li X, Ma XL et al. Catenary optics for achromatic generation of perfect optical angular momentum. Sci Adv 1, e1500396 (2015). doi: 10.1126/sciadv.1500396 |
[9] | Hu YQ, Li L, Wang YJ et al. Trichromatic and tripolarization-channel holography with noninterleaved dielectric metasurface. Nano Lett 20, 994–1002 (2020). doi: 10.1021/acs.nanolett.9b04107 |
[10] | Wu PC, Sokhoyan R, Shirmanesh GK et al. Near‐infrared active metasurface for dynamic polarization conversion. Adv Opt Mater 9, 2100230 (2021). doi: 10.1002/adom.202100230 |
[11] | Song QH, Baroni A, Wu PC et al. Broadband decoupling of intensity and polarization with vectorial Fourier metasurfaces. Nat. Commun 12, 3631 (2021). doi: 10.1038/s41467-021-23908-0 |
[12] | Song MW, Feng L, Huo PC et al. Versatile full-colour nanopainting enabled by a pixelated plasmonic metasurface. Nat Nanotechnol 18, 71–78 (2023). doi: 10.1038/s41565-022-01256-4 |
[13] | Li X, Chen QM, Zhang X et al. Time-sequential color code division multiplexing holographic display with metasurface. Opto-Electron Adv 6, 220060 (2023). doi: 10.29026/oea.2023.220060 |
[14] | Guo YH, Pu MB, Zhang F et al. Classical and generalized geometric phase in electromagnetic metasurfaces. Photon Insights 1, R03 (2022). doi: 10.3788/PI.2022.R03 |
[15] | Wang SM, Wu PC, Su VC et al. A broadband achromatic metalens in the visible. Nat Nanotechnol 13, 227–232 (2018). doi: 10.1038/s41565-017-0052-4 |
[16] | Zhang F, Pu MB, Li X et al. Extreme-angle silicon infrared optics enabled by streamlined surfaces. Adv Mater 33, 2008157 (2021). doi: 10.1002/adma.202008157 |
[17] | Fan QB, Xu WZ, Hu XM et al. Trilobite-inspired neural nanophotonic light-field camera with extreme depth-of-field. Nat Commun 13, 2130 (2022). doi: 10.1038/s41467-022-29568-y |
[18] | Chen MK, Liu XY, Sun YN et al. Artificial Intelligence in Meta-optics. Chem Rev 122, 15356–15413 (2022). doi: 10.1021/acs.chemrev.2c00012 |
[19] | Krasikov S, Tranter A, Bogdanov A et al. Intelligent metaphotonics empowered by machine learning. Opto-Electron Adv 5, 210147 (2022). doi: 10.29026/oea.2022.210147 |
[20] | Chen MK, Liu XY, Wu YF et al. A meta-device for intelligent depth perception. Adv Mater 35, 2107465 (2023). doi: 10.1002/adma.202107465 |
[21] | Li ZS, Sun JS, Fan Y et al. Deep learning assisted variational Hilbert quantitative phase imaging. Opto-Electron Sci 2, 220023 (2023). doi: 10.29026/oes.2023.220023 |
[22] | Liu C, Ma Q, Luo ZJ et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat Electron 5, 113–122 (2022). doi: 10.1038/s41928-022-00719-9 |
[23] | Gao XX, Ma Q, Gu Z et al. Programmable surface plasmonic neural networks for microwave detection and processing. Nat Electron 6, 319–328 (2023). doi: 10.1038/s41928-023-00951-x |
[24] | Li LL, Ruan HX, Liu C et al. Machine-learning reprogrammable metasurface imager. Nat Commun 10, 1082 (2019). doi: 10.1038/s41467-019-09103-2 |
[25] | Li LL, Zhao HT, Liu C et al. Intelligent metasurfaces: control, communication and computing. eLight 2, 7 (2022). doi: 10.1186/s43593-022-00013-3 |
[26] | Blake R, Wilson H. Binocular vision. Vision Res 51, 754–770 (2011). doi: 10.1016/j.visres.2010.10.009 |
[27] | Hirschmuller H. Accurate and efficient stereo processing by semi-global matching and mutual information. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) 807–814 (IEEE, 2005); http://doi.org/10.1109/CVPR.2005.56. |
[28] | Zhou HQ, Wang YT, Li X et al. A deep learning approach for trustworthy high-fidelity computational holographic orbital angular momentum communication. Appl Phys Lett 119, 044104 (2021). doi: 10.1063/5.0051132 |
[29] | He C, Zhao D, Fan F et al. Pluggable multitask diffractive neural networks based on cascaded metasurfaces. Opto-Electron Adv 7, 230005 (2024). |
[30] | Liao MH, Zheng SS, Pan SX et al. Deep-learning-based ciphertext-only attack on optical double random phase encryption. Opto-Electron Adv 4, 200016 (2021). doi: 10.29026/oea.2021.200016 |
[31] | Hao JY, Lin X, Lin YK et al. Lensless complex amplitude demodulation based on deep learning in holographic data storage. Opto-Electron Adv 6, 220157 (2023). doi: 10.29026/oea.2023.220157 |
[32] | Ma TG, Tobah M, Wang HZ et al. Benchmarking deep learning-based models on nanophotonic inverse design problems. Opto-Electron Sci 1, 210012 (2022). doi: 10.29026/oes.2022.210012 |
[33] | Lin CH, Huang SH, Lin TH et al. Metasurface-empowered snapshot hyperspectral imaging with convex/deep (CODE) small-data learning theory. Nat Commun 14, 6979 (2023). doi: 10.1038/s41467-023-42381-5 |
[34] | Chang JR, Chen YS. Pyramid stereo matching network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 5410–5418 (IEEE, 2018);http://doi.org/10.1109/CVPR.2018.00567. |
[35] | Zhou Y, Zheng HY, Kravchenko II et al. Flat optics for image differentiation. Nat Photonics 14, 316–323 (2020). doi: 10.1038/s41566-020-0591-3 |
[36] | Guo C, Xiao M, Minkov M et al. Photonic crystal slab Laplace operator for image differentiation. Optica 5, 251–256 (2018). doi: 10.1364/OPTICA.5.000251 |
[37] | Kim Y, Lee GY, Sung J et al. Spiral metalens for phase contrast imaging. Adv Funct Mater 32, 2106050 (2022). doi: 10.1002/adfm.202106050 |
[38] | Chen MK, Yan Y, Liu XY et al. Edge detection with meta-lens: from one dimension to three dimensions. Nanophotonics 10, 3709–3715 (2021). doi: 10.1515/nanoph-2021-0239 |
[39] | Badloe T, Kim Y, Kim J et al. Bright-field and edge-enhanced imaging using an electrically tunable dual-mode metalens. ACS Nano 17, 14678–14685 (2023). doi: 10.1021/acsnano.3c02471 |
[40] | Huo PC, Zhang C, Zhu WQ et al. Photonic spin-multiplexing metasurface for switchable spiral phase contrast imaging. Nano Lett 20, 2791–2798 (2020). doi: 10.1021/acs.nanolett.0c00471 |
[41] | Zhou JX, Qian HL, Chen CF et al. Optical edge detection based on high-efficiency dielectric metasurface. Proc Natl Acad Sci USA 116, 11137–11140 (2019). doi: 10.1073/pnas.1820636116 |
[42] | Zhou JX, Liu SK, Qian HL et al. Metasurface enabled quantum edge detection. Sci Adv 6, eabc4385 (2020). doi: 10.1126/sciadv.abc4385 |
[43] | Hamid MS, Manap NA, Hamzah RA et al. Stereo matching algorithm based on deep learning: A survey. J King Saud Univ - Comput Inf Sci 34, 1663–1673 (2022). |
[44] | Xu HF, Zhang J, Cai JF et al. Unifying flow, stereo and depth estimation. IEEE Trans Pattern Anal Mach Intell 45, 13941–13958 (2023). doi: 10.1109/TPAMI.2023.3298645 |
[45] | Taigman Y, Yang M, Ranzato MA et al. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition 1701–1708 (IEEE, 2014);http://doi.org/10.1109/CVPR.2014.220. |
[46] | He KM, Zhang XY, Ren SQ et al. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016);http://doi.org/10.1109/CVPR.2016.90. |
[47] | Li WY, Liu XY, Yuan YX. SIGMA: Semantic-complete graph matching for domain adaptive object detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 5281–5290 (IEEE, 2022);http://doi.org/10.1109/CVPR52688.2022.00522. |
[48] | Li WY, Liu XY, Yuan YX. SIGMA++: Improved semantic-complete graph matching for domain adaptive object detection. IEEE Trans Pattern Anal Mach Intell 45, 9022–9040 (2023). |
[49] | Vaswani A, Shazeer N, Parmar N et al. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems 6000–6010 (Curran Associates Inc. , 2017). |
[50] | Kendall A, Martirosyan H, Dasgupta S et al. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the 2017 IEEE International Conference on Computer Vision 66–75 (IEEE, 2017);http://doi.org/10.1109/ICCV.2017.17. |
[51] | Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition 3354–3361 (IEEE, 2012);http://doi.org/10.1109/CVPR.2012.6248074. |
[52] | Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI-8, 679–698 (1986). |
[53] | Zhao MX, Chen MK, Zhuang ZP et al. Phase characterisation of metalenses. Light Sci Appl 10, 52 (2021). doi: 10.1038/s41377-021-00492-y |
Supplementary information for Edge enhanced depth perception with binocular meta-lens |
Schematic of the edge-enhanced spatial computing with binocular meta-lens. There are two letter objects in front of the binocular meta-lens, which are texture-less and have no background. A binocular meta-lens is designed and fabricated to develop the stereo vision system for texture-less spatial computing scenarios. An edge-enhanced depth perception is realized with the support of a proposed neural network.
Optical and SEM images of fabricated binocular meta-lens. (a) Optical image of the binocular meta-lens. (b) The zoomed-in top-view SEM image of the meta-lens. (c) The zoomed-in tilted-view SEM image at the edge of the meta-lens.
Disparity computation with neural network. (a) Architecture overview of proposed neural network H-Net with H-Module. The stereo images are processed by weight-sharing backbones to extract features. These features are then combined using cross-pixel interaction and cross-view interaction in an H-Module. A 4D cost volume is created from the left and right image features, which is then used in a 3D CNN for depth estimation. A disparity regression module is performed before the final disparity map prediction. (b) Detailed pipelines of the cross-pixel interaction. The left and right feature maps are flattened and processed through separate fully connected layers to generate Query, Key, and Value vectors. The inner product is utilized to compute the similarity between Query and Key, resulting in weight coefficients for each Key. These coefficients are used for cross-pixel attention, associating each Key with its corresponding Value. The weighted Values are aggregated to produce enhanced features. (c) Detailed pipelines of the cross-view interaction. The difference from the cross-pixel interaction is the inner product of Key and Query vector comes from different stereo views.
Characterization of binocular meta-lens. (a) X-Z plane focusing profiles of left and right meta-lens under 532 nm of wavelength. The measured focal lengths of left and right meta-lenses are 10.048 mm and 10.046 mm, respectively, which are denoted by yellow dashed lines. (b) Designed phase distribution of the meta-lens. (c) Corresponding measured phase distribution of the meta-lens in (b).
Edge-enhanced depth perception of various objects. The first column is the raw left image. The second column is the corresponding depth map. The third column is the edge-enhanced depth map. The second and third columns use the same color bar on the right of the third column. The fourth column is the integration image of the raw image and edge-enhanced depth map. (a) Two pieces of transparent plastic paper printed with "RIKEN" and "CITYU" in black letters are placed at 16.0 cm and 12.8 cm, respectively. (b) A piece of sketch paper printed with a tilted three-dimensional building is located at 17.3 cm as the background. The front ends of the two toy cars are approximately 12.9 cm and 15.7 cm, respectively. (c) The two architectural sketches are at 13.5 cm and 16.5 cm, respectively. (d) The background architecture sketch is positioned at 17.3 cm. The depth of a toy car's body ranges from 12.5 cm to 15.5 cm.