Optical scanning endoscope via a single multimode optical fiber

Optical endoscopy has become an essential diagnostic and therapeutic approach in modern biomedicine for directly observing organs and tissues deep inside the human body, enabling non-invasive, rapid diagnosis and treatment. Optical fiber endoscopy is highly competitive among various endoscopic imaging techniques due to its high flexibility, compact structure, excellent resolution, and resistance to electromagnetic interference. Over the past decade, endoscopes based on a single multimode optical fiber (MMF) have attracted widespread research interest due to their potential to significantly reduce the footprint of optical fiber endoscopes and enhance imaging capabilities. In comparison with other imaging principles of MMF endoscopes, the scanning imaging method based on the wavefront shaping technique is highly developed and provides benefits including excellent imaging contrast, broad applicability to complex imaging scenarios, and good compatibility with various well-established scanning imaging modalities. In this review, various technical routes to achieve light focusing through MMF and procedures to conduct the scanning imaging of MMF endoscopes are introduced. The advancements in imaging performance enhancements, integrations of various imaging modalities with MMF scanning endoscopes, and applications are summarized. Challenges specific to this endoscopic imaging technology are analyzed, and potential remedies and avenues for future developments are discussed.


Introduction
Endoscopes are widely employed visual instruments in biomedicine, allowing in-depth, rapid and minimally invasive imaging of objects that cannot be directly accessed from surfaces.In clinical medicine, specialized endoscopes designed for various tubular or cavity organs, such as gastrointestinal tract, throat, nasal passages and brain ventricles, have been developed 1 .To alleviate the discomfort of patients during examination processes and enhance the reliability of diagnosis, medical endoscopes inherently demand continuous miniaturization, excellent spatiotemporal resolution, and high image quality.In life science, similar requirements are placed on endoscopes to achieve high-resolution imaging while minimizing invasiveness to the biological tissue or cells 2 .These application demands are driving the fast development of endoscopes.So far, diverse endoscopes based on different principles have been created and can be broadly categorized as follows: rigid lens endoscopes, electronic endoscopes, and optical fiber endoscopes 3 .Among them, optical fiber endoscopes have some unique advantages, including high flexibility, small probe size, ease of hightemperature sterilization, and resistance to electromagnetic interference, providing irreplaceable advantages in specific applications 4,5 .The inception of optical fiber endoscopes dates back to 1954 when Hopkins et al. pioneered their development 6,7 .This time even predates the proof that optical fiber could serve as a medium for extremely low-loss communication 8 .Optical fiber endoscopes can be categorized into fiber bundle endoscopes, single-mode fiber scanning endoscopes, or multimode fiber endoscopes, depending on the type of employed optical fiber.
The optical fiber bundle endoscope is the earliest fiber-based endoscope.Fiber bundles comprise numerous individual optical fibers, with each fiber independently transmitting one pixel of the image.As a result, the obtained image is presented in a pixelated form.Intuitively, the imaging resolution of fiber bundles depends on the size of the cores and the distance between adjacent cores.At present, commercially available optical fiber bundles have core radii around 1 μm and center-to-center spacings around 3 μm 9,10 .The single-mode fiber scanning endoscope is the other mature fiber-based endoscope.Single-mode optical fibers can only transmit optical energy because all spatial information carried by the incident light is merged into a single optical guiding mode.To focus light and conduct the scanning task, the distal end of the single-mode optical fiber is integrated with a two-dimensional microelectromechanical system (MEMS) mirror or a piezoelectric actuator and lens system 11−13 .The imaging properties of the single-mode optical fiber scanning endoscopes are significantly different from the fiber bundle endoscopes.The physical imaging resolution limit of this endoscope is determined by the point spread function of the lens system while its practical resolution in use depends on scanning motion and sampling rate 14,15 .A large field-of-view (FOV) of the single-mode fiber scanning endoscopes can be acquired by increasing the displacement of MEMS mirrors or the resonance amplitude of the fiber tip.The fiber bundle endoscope and the single-mode fiber scanning endoscope are both well-established endoscopic techniques and have been employed in many biomedical applications 16 .Regarding most imaging applications for relatively large tubular or cavity structures within the human body, the two fiber-based endoscopes already meet the demands effectively.However, when the target applica-tion involves some extreme scenarios, such as high-resolution in vivo imaging of deep brain areas and tubular structures with only a few hundred micrometers in size, such as the Schlemm's canal in the human eye 17 , they are currently not suitable.In such application scenarios, the footprints of fiber bundle endoscopes and single-mode optical fiber scanning endoscopes are relatively large, preventing them from accessing the desired area of samples for imaging or creating substantial mechanical lesions of the tissues 10,18 .Besides, the fiber bundle endoscopes face the challenge related to the imaging resolution, which is restricted by the distance of different fiber bundle cores and the fabrication crafts 19−21 .In this context, MMF-based endoscopes are potential solutions for these applications.
Unlike single-mode optical fibers, multimode optical fibers (MMF) support the transmission of numerous guided modes, offering a large number of degrees of freedom for manipulating the optical field and thereby allowing for all-optical scanning 22 .As illustrated in Fig. 1, focused light spots at the distal end of an MMF are attained by employing a spatial light modulator (SLM) to perform wavefront shaping on the incident optical field 23,24 .The wavefront shaping information used to modulate light for focusing can be acquired by transmission matrix measurements, digital phase conjugate methods and phase optimization algorithms 22,25,26 .So far, the MMF scanning endoscope (MFSE) has achieved diffraction-limited foci and submicron imaging resolution.When compared to the optical fiber bundle, the effective pixel density delivered by an MMF can be significantly higher 25,27 .Moreover, the MFSE focuses light in an alloptical manner without integrating lenses or actuators at the distal end of the MMF.Therefore, the size of the endoscope probe is only determined by the diameter of the MMF, which is smaller than the probe of the singlemode fiber scanning endoscope.Certainly, it should be highlighted that the MMF-based endoscopes are still in development and are not yet operational in practical applications.There are still some issues that need to be addressed in the MFSE technique.For example, the scanning imaging performance of MFSE is highly susceptible to external dynamic disturbances and the deformation of MMF.Besides, the entire calibration and imaging apparatus of MFSE is complex.Therefore, substantial efforts are still needed to promote the practical applications of MFSE.
In this article, the optical scanning endoscope using a single MMF based on the wavefront shaping technique is comprehensively reviewed.Notably, in addition to generating tightly focused light spots for scanning imaging, there are some other imaging mechanisms to achieve endoscopic imaging through the MMF 28 , including widefield imaging by inverse methods 29−31 , compressive imaging technique 32−34 and deep learning-based imaging methods 35−38 .Each of endoscopic imaging techniques grounded in these mechanisms possesses a wealth of research content.To avoid excessive diversification of topics without in-depth exploration, in this review, we only focus on MMF scanning imaging using focused light spots.Compared to these alternative imaging mechanisms, MFSE is relatively more mature and can be applied to complex imaging scenarios.Many advanced imaging modalities, such as confocal endoscopy, two-photon imaging, Raman imaging and photoacoustic imaging, can be integrated with the MFSE system to achieve comprehensive enhancement of imaging capacity.The contents of this review are illustrated in Fig. 1 and arranged as follows.Section Principle of multimode optical fiber scanning endoscope introduces the three technical routes to generate focused light spots through MMFs and the scanning imaging process of MFSEs.Section Scanning imaging performance analyzes several key scanning imaging performance metrics of MFSE and introduces strategies to improve these metrics.In Section Different scanning imaging modalities, integrations of MFSEs with other advanced scanning imaging modalities and the corresponding performance improvements are outlined.
In Section Applications, the progress of MFSE in applications is introduced.Challenges faced in the current MF-SE system and the prospects of this technology are analyzed in Section Challenge and outlook, followed by a summary presented in the last section.

Principle of multimode optical fiber scanning endoscope
Optical field propagation properties of MMF a λ Over the past decade, considerable attention has been devoted to the development of endoscopes based on a single MMF.As the name implies, MMFs can support the propagation of numerous eigenmodes.The number of eigenmodes supported by a specific MMF is determined by the core radius of the optical fiber, the refractive indices of the core and cladding, and the wavelength of the incident light.Commonly, MMFs are categorized into step-index and graded-index MMFs based on different refractive index distributions.In the case of step-index MMFs, the number of modes can be estimated using the following formula (Chapter 8, pages 293 of ref. 39 ): n 1 n 2 where and are the refractive indices of the core and cladding of the optical fiber, respectively.In the case of graded-index MMFs, their cores typically conform to the following refractive index profiles (Chapter 8, page 288 of ref. 39 ): where is the maximum refractive index value inside the core and is the refractive index of the cladding.is the distance from a specific point inside the core to the center.is a parameter that determines the steepness of the graded-index profile.The number of modes supported by this kind of graded-index MMF can be estimated using the following formula (Chapter 8, page 293 of ref. 39 ): The graded-index MMFs typically exhibit lower modal dispersion and better resistance to external disturbances.Therefore, they are generally preferred in the MMF scanning imaging applications.
In practical usage, the modes excited within an MMF are determined by the relative position and angle of the incident light field with respect to the input face of the optical fiber.The excited eigenmodes propagate within the MMF with different propagation constants, leading to what is known as modal dispersion as illustrated in Fig. 2. When an optical field carrying ordered information enters an MMF, it is transformed into a series of eigenmodes.During the propagation process, mode coupling may occur among these eigenmodes due to non-uniformities of the MMF along the propagation direction or external disturbances.At the output end of MMF, the phase differences, relative amplitudes and polarization states among these eigenmodes become completely different from the initial state due to the modal dispersion and mode coupling effects 40 .As a result, the total optical field formed by the superposition of various eigenmodes exhibits a speckled distribution at the output end of MMF.The information carried in the optical field is also scrambled.Therefore, MMFs are not suitable for direct image transmission or the formation of focused light spots.To obtain a focused light spot for scanning imaging through the MMF, measures must be taken to proactively compensate for the information distortion caused by the modal dispersion and mode coupling.
Technical routes for light focusing So far, to generate focused light spots through MMF for scanning imaging, numerous techniques have been put forth to counteract the modal scrambling properties of MMF based on the wavefront shaping principle.These techniques include the transmission matrix measurement method, digital phase conjugation method and phase optimization algorithms.Next, we will introduce these techniques in detail.

Transmission matrix measurement method
The transmission matrix measurement method was initially developed to enable the use of disordered media for imaging applications 31,41 .Because of the resemblance between the modal dispersion in MMFs and the linear scattering effect in disordered media, it was adapted for use in the realm of MMF endoscopic imaging.In this method, the MMF is regarded as a linear system.The input and output ends of the optical fiber are divided into a series of orthogonal spatial modes by a square grid as illustrated in Fig. 3(a).If the optical field is excited in a specific spatial mode at the input end, the response including amplitude and phase of the optical field at the output end, i.e. the pulse response function of the linear system, can be measured.Once the responses at the output end for all input modes are acquired, we obtain the complete transmission matrix of the MMF, which can be used to calculate the output optical field corresponding to arbitrary input optical field by formula:  where and are column vectors, in which each element represents the complex amplitude of the corresponding spatial modes.Each column of corresponds to the measured pulse response function for a specific input spatial mode.In the MFSE, it is desired to generate a series of focal spots at the output end of the MMF for scanning imaging.The wavefront shaping information to generate focal spots is first derived by calculating the inverse of the transmission matrix.Subsequently, it is loaded onto the input optical field of the MMF by an SLM.Therefore, the cornerstone of this technical route lies in the precise measurement of the transmission matrix.
T mn Generally speaking, the transmission matrix is a complex matrix, in which the magnitudes and arguments of the elements represent the amplitude and phase of the response optical field for a given input spatial mode, respectively.If cameras are used to record the response optical field, only the intensity can be measured and the phase information is lost.To obtain the phase profile of the response optical field at the output end of MMF, full-field interferometric methods are often adop-ted.Schematics of optical paths to measure the complex transmission matrix of MMF or disordered media using internal and external references are presented in Fig. 3(b) and 3(c), respectively.In the case of using an internal reference optical path, light from a portion of the wavefront shaping device, such as a liquid crystal spatial light modulator (LC-SLM) or a digital micro-mirror device (DMD), is held constant and directed into the MMF as the reference light.Light from other spatial modes is sequentially incident into the MMF and interferes with the reference light.When the modulation device is an LC-SLM, a blazed phase grating should be loaded at the tested spatial mode to ensure that its first-order diffracted pattern precisely enters the MMF while light from the other spatial modes without the modulation of blazed phase grating is blocked by the aperture and does not participate in interference 23 .If the DMD is employed, it should be encoded with a specific binary amplitude Lee hologram 43 .To derive the phase of the response optical field from the interference pattern, phase-shifting interferometry is commonly employed, such as the two-    41 , American Physical Society; (c) ref. 42 , Springer Nature, under a Creative Commons Attribution 4.0 International License; (d) ref. 29 , American Physical Society.φ mn phase, three-phase and four-phase methods 41,44−46 .The four-phase method is widely used as it does not require prior measurements of the intensities of the response optical fields.According to the four-phase method, the phase profile of the response optical field can be calculated by where , , and are the interference intensities in the output mode when the phase differences between the reference light and the tested spatial mode are shifted by 0, , and .As such, if the intensity of the reference light is also measured in advance, both the magnitude and argument of the complex transmission matrix can be obtained.The internal reference method adopts a common pathway geometry and has the advantages of a simple measurement setup and insensitivity to environmental noise during the calibration processes.However, the internal reference method inherently has some problems.For example, the interference light also passes through the MMF, resulting in a speckle-like reference pattern.In the speckle pattern, some positions do not have sufficient optical power, which causes strong measurement errors with the phaseshifting interferometry and leads to "blind spots" in the obtained transmission matrix.Besides, the polarization parameter is not easy to optimize in this method 23,47 .Recently, Jákl et al. demonstrated that the problem of "blind spots " can be mitigated by conducting multiple measurements of the transmission matrix using various internal references 48,49 .In the study of Collard et al., they selected some internal reference beams with more uniform amplitude profiles in the interference plane to conduct the calibration and achieved a higher focusing efficiency with the measured transmission matrix 50 .
As an alternative, the external reference optical path was also employed to measure the transmission matrix, circumventing certain issues inherent to the internal reference method.A typical calibration optical path of the external reference method is displayed in Fig. 3(c).The reference light propagates in a single-mode fiber and generates a Gaussian beam with relatively uniform intensity distribution to interfere with the light output from the MMF, thereby eliminating problems with "blind spots".However, since the reference and test light propagate in different optical pathways, mechanical and thermal drifts of the setup during the calibration process potentially led to the instability of the measurement results.To ensure the stability of the phase relationship between individual optical pathways, employing a feedback loop is essential, which adds complexity to the calibration apparatus.Besides, to promote the contrast of the interference pattern, the intensity of the reference light should be properly selected.Internal and external reference methods each have their unique advantages and disadvantages when measuring the transmission matrix, so the choice should be made based on the specific circumstances.
Except for the above spatial domain measurement method, transmission matrices can also be measured in the spatial frequency domain.The experimental setup operating in the spatial frequency domain is displayed in Fig. 3(d) 29 .In the spatial frequency domain, input modes of an MMF consist of a series of plane waves with various incident angles, rather than the point sources arranged in a square grid used in the spatial domain.Correspondingly, a two-axis galvanometer scanning mirror instead of the SLM is exploited to change the incident angle of plane waves in the experimental setup.Using the setup presented in Fig. 3(d), the response optical field for each plane wave with angles ( ) can be acquired.This response optical field measured in the frequency domain cannot be directly used for wavefront shaping with SLMs.It is necessary to convert the frequency domain input-output responses into the form of a transmission matrix in the spatial domain using the following equation 29 : where is the output optical field at ( ) for an incident light at ( ).It can be acquired by inverting the matrix .The coordinate system that defines these variables in Eq. ( 6) is displayed on the left of Fig. 3(d).Based on Eq. ( 6), the spatial frequency domain measurements can be exploited to focus light in the same way as the transmission matrix measured in the spatial domain.
When the above methods are used to acquire the transmission matrix, the optical response (i.e. one column of the transmission matrix) of each input spatial mode needs to be measured one by one.However, if the employed MMF has perfect cylindrical geometry, the optical responses corresponding to input spatial modes at the same radial radius should exhibit rotational symmetry, which can be exploited to accelerate the measurements of transmission matrices.Specifically, once the optical response of an input spatial mode is obtained, it can be used to calculate the optical responses of other spatial modes located in a circle through a rotation operator, significantly reducing the calibration time.In 2015, Amitonova et al. demonstrated that the rotational memory effect in an MMF can be used for ultrafast highresolution scanning imaging 51 .Using the wavefront shaping information of one focused light spot, they enable the rotation of the focal spot across a 360° range with minimal change in spot size throughout the entire rotation process.Unfortunately, there is significant variation in the power of the focal spot from 4% to 40% during the rotation.Further, Li et al. proposed a more general framework describing memory effects in MMF.They identified a quasi-radial memory effect in MMFs and demonstrated that the transmission matrix can be estimated approximately by placing a guide star at the output end of the MMF as a feedback tool 52 .However, this method is sensitive to the rotational symmetry of the refractive index distribution and the geometry of MMF.If the MMF is disturbed, only partial data in the estimated transmission matrix remains valid, specifically related to the optical field responses near the guide-star region.
In addition to using interferometric methods to acquire elements in the transmission matrix, some researchers also employ optimization algorithms to obtain the complex transmission matrix only with measured intensities of the output optical fields.In 2015, Drémeau et al. utilized a Bayesian phase retrieval algorithm to estimate the transmission matrix, leveraging experimental intensity measurements with inherent noise 53 .In 2018, N' Gom et al. proposed a semidefinite program to generate the transmission matrix using intensity-only measurements 54 .They randomly generated some wavefronts and input them into the MMF.By feeding back intensities of the output optical fields into the PhaseCut-based convex optimization algorithm, they can obtain the corresponding row of the complex transmission matrix one by one.In the same year, Deng et al. introduced a phase retrieval algorithm based on Bayesian inference to estimate the transmission matrix, offering the potential for acceleration through distributed or parallel computation 55 .In 2019, Caramazza et al. introduced an iterative optimization approach for statistically reconstructing the complex inverse transformation matrix of MMF based on a physically informed model 56 .In 2020, Huang et al.   presented an approach to recursively search for the transformation matrix based on an extended Kalman filter and a modified speckle-correlation scatter matrix 57 .Using these optimization algorithms to acquire the transmission matrix of MMF has the advantage of a simple calibration setup.However, the fidelity of the transmission matrix obtained by these algorithms is inferior to that acquired by interferometric measurements and thus the quality of focal spots is degraded.
Transmission matrices obtained by all the above methods are complex, allowing the full modulation of both amplitude and phase of the optical field.For the sake of simplicity, there are also some works studying the real-valued transmission matrix for light focusing through MMF.The real-valued transmission matrix can directly connect the intensities of input and output modes.This simplified approach is specifically developed when a DMD is employed as the SLM.As we have mentioned above, if a DMD is used to conduct the complex modulation of the optical field, Lee hologram method should be adopted.However, considering that DMD fundamentally controls the "ON" and "OFF" states of individual micromirrors for wavefront shaping, it can also directly establish a binary transmission matrix to connect the relationship between the focused optical field and the binary states of micromirrors.In 2015, Tao et al. proposed a method to directly calculate the binary transmission matrix using the measured light intensities in the image plane, which were modulated by a series of masks designed with Hadamard basis on the DMD 58 .This method does not rely on measured intensities as feedback information for iterative optimization.Therefore, it achieved a very short calibration time, i.e. 75ms for 1024 input modes.In 2018, Zhao et al. presented a high-speed algorithm based on Bayes' theorem to estimate the binary transmission matrix 59 .The conditional probability of each micromirror contributing positively to the desired focal spot, when it is in the "ON" state, is calculated.Then, only those micromirrors whose probabilities exceed a defined threshold are set in the "ON " state.This method can calibrate one row of the transmission matrix in 0.26 s for 5000 input-output pairs.In 2020, they further reported a real-valued intensity transmission matrix to characterize relationships between input and output light intensities of MMF 27 .This transmission matrix was derived based on an assumption of pseudo-linearity between the input and output intensity distributions.In 2021, they provided a comprehensive comparison of some transmission matrix estimation approaches 60 .Compared to the complex modulation using DMD, the binary intensity modulation offers higher efficiency in utilizing light energy while the produced focal spots exhibit inferior contrast.Besides, the system employed for the calibration of the binary transmission matrix is simpler.
After measuring the transmission matrix, the appropriate mask is calculated for SLM to reshape the wavefront of the input optical field.Ideally, we can set as the optical field distribution of the desired output focal spot, and the corresponding mask can be obtained by calculating the inverse of the measured transmission matrix 29 : However, noise is inevitably present during the practical measurement process.In such a situation, the matrix inversion operator may introduce singular values, leading to a significant deviation of the calculated mask from the expected outcome.To address this problem, a more general scheme is exploiting optimization methods to calculate the mask.When the mean square optimized operator is adopted, the mask can be calculated by 31 : where is the standard deviation of noise and I is an identity matrix.Based on Eq. ( 8), the incident optical field with the desired wavefront can be shaped using an SLM.In the case of pure phase modulation, the amplitude of the incident optical field can be directly ignored and its phase distribution is used as the modulation mask.Notably, when amplitude modulation is not employed and all modes are set with a constant amplitude, the focusing efficiency decreases to approximately 78.5% 23 .Besides, if the input polarization is not controlled, only 50% of the input modes, on average, can be constructive interference at the designated focal point.Therefore, some researchers designed complex optical paths to measure the optical response of different polarizations in MMF and subsequently exploited the polarization information to achieve a higher focusing efficiency 23,25,61 .

Digital phase conjugation method
Phase conjugation is an effective technique to eliminate the image distortion caused by inhomogeneous media during the transmission process 62−64 .Traditional optical phase conjugation techniques were primarily implemented using photorefractive crystals and had been applied to image transmission through MMFs for several decades 65 .However, this method has several drawbacks.Photorefractive crystals are wavelength-selective and have a slow response time.Besides, the optical phase conjugation setup is complex, often requiring two identical optical fibers, which can be challenging to achieve in practical applications 66 .Recently, the emergence of pixelated SLM has made the digital phase conjugation technique a preferred choice for wavefront corrections.
In 2011, Papadopoulos et al. first applied the digital phase conjugation technique to focus light through an MMF 22,67 .A typical experimental setup to realize the digital phase conjugation in an MMF is displayed in Fig. 4(a).The implementation process generally involves four steps, as illustrated in Fig. 4(b).Firstly, a light spot is focused onto the proximal facet of the MMF as the excitation through an objective.Secondly, the excitation light transmits through the MMF, generating a speckle pattern at the output end.The output optical field interferes with a reference light beam, resulting in an interference pattern, which is recorded on a CMOS detector.Thirdly, the recorded interference pattern is exploited to reconstruct the amplitude and phase information of the output optical field with some numerical calculations 68,69 .Finally, the calculated phase is loaded into the SLM for wavefront reshaping.As shown in Fig. 4(a), a portion of the reference light is separated and modulated with the phase mask on the SLM before being reflected into the MMF.After this conjugate wave travels back through the MMF, a focused light spot at the initial input facet is formed.Using the digital phase conjugation technique, Papadopoulos et al. achieved a focused beam with a waist of 2.25 μm and an enhancement factor of 1800.They also conducted validation experiments for scanning endoscopic imaging using fluorescent beads and biological samples 70 .In 2018, Ma et al. exploited the rotational memory effect of MMF to reduce the measurements of conjugated phase masks and thus boost the calibration speed 69 .They achieved focused light spots with a fullwidth at half-maximum (FWHM) of 1.3 μm and an enhancement factor of 900.
Despite the substantial potential of the digital phase conjugation technique in the MFSE, its practical use is constrained by several significant problems.Firstly, precise pixel-to-pixel alignment between the SLM and image sensor is required, which is the prerequisite to forming the correct phase conjugate beam.Secondly, the phase conjugate beam and the initial light beam output from the MMF need to be precisely superimposed.The two alignment problems are challenging and critical to generate high-quality focal spots.To facilitate the alignment between the SLM and CMOS camera, Ma et al. suggested adding a 4f system between the SLM and CMOS camera 69 .This modification simplifies the alignment process by imaging the grayscale pattern from the SLM onto the CMOS camera.Mididoddi et al. later proposed a transmission matrix-assisted calibration method to relax the stringent alignment criteria between the SLM and the CMOS camera 71 .To achieve a good overlap between the phase conjugate beam and the light beam output from the MMF, the tilt parameters of mirror M3 in Fig. 4(a) should be well adjusted.Jang et al. also proposed an auto-alignment method for the digital phase conjugation system using a digital propagation method 72 .Anyway, to maintain the alignment state, both mechanical and thermal stabilities of the imaging system based on digital phase conjugation are required to be high enough.

Phase optimization method
The two methods introduced above rely on intricate op-tical systems for measuring the propagation properties of MMFs, thereby obtaining wavefront shaping masks to generate focal spots.Unlike these two hardware-based methods, the phase optimization method focuses on algorithmic efforts to directly search for masks that can generate the desired focal spots.During the optimization process, the SLM is divided into several subregions.The phase of each subregion is optimized through iterative algorithms with the focusing intensity or contrast at the desired position as the metric of performance.Research in this direction is focused on updating iterative optimization algorithms, such as the Monte-Carlo algorithm 26 , adaptive sequential coordinate ascent algorithm 73 , stochastic parallel gradient descent algorithm 74 , self-adaptive genetic algorithm 75 , and natural evolution strategy 76 to simultaneously reduce the time of calibration and acquire more effective masks for the generation of high contrast focal spots.These algorithms can be divided into two categories.One of them uses speckle intensity measurements to facilitate the search for optimal masks, while the other relies on interference measurements.
In terms of the first category, in 2011, Leonardo et al. first presented the usage of the phase optimization method to produce focal spots through MMF 26 .They conducted It is demonstrated that the anti-interference and convergence performance of this algorithm is better than other algorithms, including the particle swarm optimization method 77 , genetic algorithm 78 , and improved ant colony algorithm 79 .To note, the later three algorithms used for comparison were initially developed for light focusing through a disordered medium.It can be seen that most algorithms used for focusing light fields in disordered media can be applied to the MMF case.Comprehensive performance comparisons among various algorithms developed for disordered media, which also may be suitable for the light focusing of MMF, can be found in a study conducted by Fayyaz et al. in 2019 80 .
The second category of phase optimization techniques is based on interference measurements, which can effectively reduce the total optimization time to generate a series of focal spots.The experimental part for these interference-based optimization methods is very similar to that adopted for measuring transmission matrices of MMF.In the experiment, the incident and output optical fields are partitioned into a series of square orthogonal modes on the SLM and the output facet of the MMF, respectively.As shown in Fig. 5(a−  any desired output mode can be determined.Once all input modes have been tested, we can extract the phase mask required to focus light at any desired output mode.This optimization process can simultaneously yield the phase mask required for focusing light at any position on the output facet of the MMF, which distinguishes it from the optimization algorithms based on speckle intensity measurements.Besides, when the external reference pathway is exploited in the optimization procedure as shown in Fig. 5(e−g), the effect of all the parameters including phase, amplitude and polarization on light focusing can be fully analyzed.Based on the optimization for all these parameters, Čižmár et al. generated focal spots carrying 80% of the total power 23 .In 2015, Yin et al. proposed an adaptive parallel coordinate algorithm to reduce the calibration time in the interference-based optimization method 81 .Rather than exhaustively recording interference patterns of reference and test light at phase differences varying from 0 to 2π, they employed a threephase shifting method to compute the relative phase between the reference and test light.Then, the optimal phase of test light to achieve the focus of light at a desired position can be obtained through calculations based on the interference formula.The three-phase shifting method was previously applied in the adaptive sequential coordinate ascent algorithm to optimize the phase mask 73 .However, it is slower than the adaptive parallel coordinate algorithm.Later, Chen et al. adopted a similar principle to optimize the phase mask with a two-phase shifting method 82,83 .It is evident that using phase shifting in conjunction with data processing greatly reduces the time required for experimental meas-urements, as compared to the phase enumeration manner.As outlined above, these interference-based phase optimization methods, particularly the adaptive parallel coordinate algorithm, share many similarities with the transmission matrix method, both in terms of experimental setups and data processing.Therefore, the calibration time of these interference-based phase optimization methods is comparable with the transmission matrix method.Finally, in order to intuitively compare the focusing performance of MMFs based on different calibration techniques, the properties of focal spots reported in some representative works along with the experimental details are summarized and presented in Table 1.These subdivided technical routes include interferometric measurements of transmission matrix (TM-I), transmission matrix derived by optimization algorithm (TM-A), realvalued intensity transmission matrix (TM-R), digital phase conjugation (PC), digital phase conjugation combined with rotational memory effect (PC-R), phase optimization method based on speckle intensity (PO-S) and interference (PO-I) measurements.Here, we are concerned with two parameters of the focused light spot, i.e. the size and the contrast, which have a significant impact on the scanning imaging performance.The size of a focal spot is mainly measured by its FWHM while its contrast is characterized by the enhancement factor (η) or the power ratio (PR).The enhancement factor is the ratio between the peak intensity of the focal spot and the average intensity of the background.The PR represents the proportion of optical power contained in the focal spot relative to the total power transmitted through the Among all techniques, the TM-I method has achieved the highest contrast of focal spots using a complex modulation.More details about how to achieve this optimal focusing performance are discussed in Section Scanning imaging performance.Although techniques based on non-interferometric measurements, such as TM-A, TM-R, and PO-S, do not yield very good focal contrast, they are advantageous in simple measurement setups and high temporal stability.These methods offer budget-friendly options in scenarios where high imaging performance is not necessary.

All-optical scanning imaging
Once the calibration for the MFSE is completed by these introduced techniques, masks to shape the wavefront for the generation of focal spots can be acquired and stored for subsequent scanning imaging applications.The experimental setup exploiting the stored masks for all-optical scanning imaging is displayed in Fig. 6(a).To note, this setup is used for the most common reflection or fluorescence imaging mode and setups for other imaging modalities will be discussed in Section Different scanning imaging modalities.The optical path used for calibration with different techniques is also ignored in this schematic.The light beam from the laser is expanded to cover the required modulation region on the SLM.Then, the wavefront of the light beam is modulated by the SLM with a certain stored mask.Subsequently, the modulated light beam located at the first diffraction order passes through an aperture and is coupled into the MMF through an objective lens.After the light beam exits the MMF, it forms a focal spot on a plane determined by the calibration.The object under the illumination of the focused light spot will reflect light or emit fluorescence, which is coupled into the MMF and  propagates in the opposite direction to the illumination light.To direct the signal light into the photodetector, a beam splitter or a dichroic mirror is arranged in the proximal end of the MMF for the reflection or fluorescence imaging modes, respectively.The all-optical scanning imaging procedures are outlined in Fig. 6(b).The stored masks are sequentially loaded onto the SLM, allowing focal spots to be generated at different positions in a temporal sequence for scanning imaging.Generally, these sequentially generated focal spots are used to scan the sample in a raster scanning manner.Figure 6(c) presents a typical combined image of sequentially generated focal spots at the distal end of MMF.These sequentially generated focal spots are evenly distributed at the grid points of a square lattice to sample the object.Then, the image of an object is synthesized using the time-varying intensity signals measured with the photodetector along with the calibrated position data of the corresponding focal spots.As an example, the synthesized image of a resolution target is shown in Fig. 6(d).
From the above description, it is evident that the alteration of the focal spot's position in the MFSE is accomplished through a purely optical method, i.e. wavefront shaping with various masks applied in the SLM.All the imaging-assistive elements like the SLM are positioned at the proximal end of the MMF.Consequently, the footprints of MFSEs are solely determined by the diameters of MMFs, offering a notable advantage over the singlemode fiber scanning endoscope that relies on the mechanical actuator located at the distal end of the fiber to conduct scanning imaging.Due to the reduced probe size, the MFSE promises great potential for minimally invasive biomedical applications.Certainly, the singlemode fiber scanning endoscope, being a mature and well-established technology, also has notable advantages when compared to MFSEs, including strong resistance to external disturbance and faster scanning imaging speed.The issues with MFSEs in these two aspects are indeed major factors that hindered their widespread practical applications.In order to harness the potential of MFSEs in terms of small footprints, it is necessary to overcome these practical challenges.

Scanning imaging performance
As a promising imaging technique, the scanning imaging performance of the MFSE has garnered significant attention from researchers.Currently, the following per-formance factors are the primary concerns of the MFSE, including imaging resolution, imaging contrast, scanning imaging speed, stability of the system, working distance and FOV.Many strategies have been suggested to enhance these performance aspects.Below, we will provide an overview of these performance improvement strategies.Due to the fundamental constraint imposed by working distance and FOV on imaging resolution and contrast, the developments related to the working distance and FOV are not discussed in separate sections.Instead, they are involved in the discussion of imaging resolution and contrast in Sections Imaging resolution and Imaging contrast.

Imaging resolution
Imaging resolution determines how small objects can be distinguished by an imaging system.As a scanning imaging equipment, the imaging resolution of the MFSE is primarily determined by the sizes of focal spots, which are affected by factors stemming from both the experimental implementation and the fundamental physical principle.In the aspect of experimental implementation, the fidelity of the wavefront shaping mask acquired by a certain calibration technique is important.If the wavefront shaping mask cannot accurately compensate for the modal scrambling of the MMF, the focused light spot disperses and cannot achieve the minimum size.In the imaging process, whether the modulated optical field can be coupled into the MMF at the correct spatial position also has a significant impact on the focal spot.Nevertheless, these issues can be addressed by carefully arranging the experiments.As shown in Table 1, most focal spots reported in these representative works achieved a diffraction-limited size.Therefore, the more challenging factors that restrict the sizes of focal spots are imposed by fundamental physical principles, including the numerical aperture (NA) of the MMF probe and the working distance.
We first introduce the influence of NA on imaging resolution and the related advancements.MMFs with a larger NA can support the propagation of more eigenmodes and thereby provide more controllable degrees of freedom to produce smaller focused light spots.NAs of commonly available MMFs typically range from 0.2 to 0.5, which results in imaging resolutions achieved in most works being larger than 1 μm as displayed in Table 1.To further improve the imaging resolution, so far, several approaches have been proposed to increase the NA.In 2013, Choi et al. and Papadopoulos et al. found that placing a highly scattering medium in front of an MMF can physically increase the NA of the entire imaging system.Choi et al. coated a layer of ZnO nanoparticles on the surface of the MMF and found that the effective NA of the system increased from 0.22 to 0.4 85 .Papadopoulos et al. employed a layer of 20 μm white paint deposited on a glass slide as the scattering medium.They found that the size of the focused light spot decreases with the increase of distance between the scattering medium and MMF 86 .When the light field is focused at 800 μm away from the scattering medium, the size of the focal spot remains smaller than 1 μm, exhibiting a significant improvement in both NA and working distance.The increased NA is attributed to both the enlargement of the physical output aperture and the enhancement of coupling from high-to low-order modes that can propagate through the MMF.However, the use of scattering media significantly increases the transmission loss and degrades the collection efficiency of the reflected signal.Jang et al. later adopted a holistic random encoding method to improve the signal-to-noise ratio of image reconstruction 87 .
The other method to increase the NA is assembling micro-optic components on the facet of MMF.In 2013, Bianchi et al. fabricated a micro-parabolic reflector on the facet of MMF using the two-photon polymerization technique 88 .The NA of the obtained device is close to 1, and it can produce a focal spot with an FWHM of ~300 nm.Apart from these creative innovations, one of the most direct approaches to increasing NA is leveraging specialized optical fibers.In 2016, Amitonova et al. demonstrated that well-designed multimode photonic crystal fibers (PCF) possess an NA of ~0.6, which is larger than those of commonly available MMFs and thus enables the focal spot to have a subwavelength size 89 .However, these hollow PCFs are not well-suited for operation in the aqueous environment.Later, Leite et al. fabricated an all-solid soft-glass MMF that achieves an NA exceeding 1 for wavelengths below 620 nm and reaching 0.96 at 1064 nm 90 , showcasing significant promise in enhancing imaging resolution.
Working distance is another fundamental factor limiting the sizes of focal spots and thus the imaging resolution.Working distance is the length between the focusing plane and the fiber facet.Given the fact that the aperture of a specific MMF is fixed, as the working distance increases, the maximum propagation angle of light rays that contribute to the focal spot is reduced, which means the effective NA of MMF exploited for light focusing is smaller than its intrinsic NA 70 .As the effective NA becomes smaller due to the increase in working distances, the sizes of focal spots inevitably increase.In some reported studies, long-distance three-dimensional (3D) scanning imaging using MMF has been achieved based on the time-of-flight technique and optimized phase modulation strategies.However, these techniques can only enhance the contrast of focal spots and the efficiency of collecting signals at long distances.The imaging resolutions in these works were significantly degraded to the millimeter scale 91,92 .The above-mentioned work that used a glass slide deposited with white paint as the scattering medium to increase the NA can simultaneously improve the imaging resolution and increase the working distance 86 .However, since a glass slide with a large aperture was required to be positioned at the distal end of the MMF, the most remarkable advantage of the MFSE, i.e. an ultrathin probe, is sacrificed.Therefore, the trade-off between working distance and imaging resolution is still a challenge in this field.In experiments, it is necessary to properly select the imaging plane to balance the imaging resolution and working distance.
Furthermore, it should be noted that the uniformity of produced focal spots at different raster-scanning positions is also a key factor affecting the overall imaging resolution.In some situations, the MMF can only achieve diffraction-limited focusing in the region near the core while the focal spot in the region closer to the cladding becomes elliptical 93 .Descloux et al. demonstrated that the number of modes required for diffraction-limited focusing in edge is much more than that required for central focusing 94 .Therefore, to achieve diffraction-limited focusing at the edge, the phase mask must be sufficiently complex to fully compensate for the mode dispersion, which implies that more divided segments on SLM are needed.They suggested that a minimum of (V is the normalized frequency of MMF) segments should be employed to ensure uniform raster-scanning imaging 94 .In case the segments of SLM are limited, Lyu et al. suggested coupling light closer to the core's edge to acquire a round focal spot for scanning imaging 93 .Besides, it was found the MMF with a square core can focus light more uniformly at different positions than the MMF with a round core.Therefore, this novel MMF is a competitive candidate to substitute the traditional MMF with a round core for scanning imaging 95 .Moreover, the computational algorithms based on convolution and deconvolution are also choices to improve the image resolution 96 .
The above-discussed methods are mainly focused on improving the imaging resolution by modifying the MMF probe or optimizing the experimental implementations.In addition, the integration of MMF with other advanced scanning imaging modalities, such as the confocal endoscopy and two-photon imaging modality can also enhance the imaging capacity 97−99 .We will introduce these advancements in Section Different scanning imaging modalities in detail.

Imaging contrast
Imaging contrast is another crucial parameter for an imaging system 100 .The scanning imaging process of MF-SE comprises two sequential steps: focusing the optical field to create an illumination spot on the sample and capturing the signal light reflected or emitted by the sample.Several factors have substantial impacts on the imaging contrast in both steps.
In the aspect of optical field focusing, the contrast of focal spots generated through an MMF is a determinant.As we have mentioned in Section Principle of multimode optical fiber scanning endoscope, the contrast of the focal spot is characterized using two metrics, i.e. the enhancement factor (η) and the power ratio (PR).According to calculations, if an MMF supports 1000 eigenmodes, the PR value of the generated focal spot can exceed 99% 101 .Therefore, the PR value of the focal spot obtained in experiments below this theoretical calculation is attributed to imperfect experimental implementation rather than physical limitations.To fully utilize the numerous eigenmodes of an MMF for light focusing, it is essential to ensure that the number of input modes controlled by the SLM is comparable with the number of eigenmodes, which has already been mentioned in the preceding analysis of the uniformity of focal spots.Within a certain range, the more input modes we can control, the higher the PR of the focal spot will be.However, as the number of input modes increases, the number of pixels contained in each segment partitioned on the SLM decreases inevitably, which may unexpectedly lead to the reduction of PR when LC-SLM is utilized.This reversal is the result of the severe scattering noise present within the LC-SLM.These scattering noises seriously degrade the fidelity of the phase mask when the intensity of each input mode is low, consequently diminishing the quality of the focal spot 102 .This issue is less pronounced with the use of a DMD, one of the reasons for its growing popularity.
When the controllable segments are enough, the other key point is to comprehensively perform complex modulation of the optical field, including phase, polarization, and amplitude.According to the analysis of Čižmár et al., lacking polarization and amplitude control may result in ~50% and ~21.5% of the available optical power being randomly dispersed across the output area as noise, respectively 23 .To control the two polarizations of light, the transmission matrices of both polarizations should be individually measured and the mask loaded onto the SLM is a superposition of masks designed for the independent modulation of each polarization.If only one polarization is controlled, the single circular polarization can yield better PR than the linear polarization.The amplitude modulation is achieved by manipulating the diffraction grating on the modulator to prevent the dominant noise-contributing modes from coupling into the fiber.However, this comes at the cost of reducing the overall output power.By optimizing the experimental setup and employing the complex modulation, Gomes et al. have achieved focal spots with a PR of ~96%, which is the highest value reported to date 84 .
In addition to the two fundamental requirements, there are numerous experimental details that must be carefully considered in both the calibration and imaging procedures to achieve near-perfect focusing.In the process of calibration, different techniques involve various experimental setups and thus the origins of mask measurement errors vary.We have already clarified the key points for each calibration technique in Section Principle of multimode optical fiber scanning endoscope.In terms of the imaging process, Gomes et al. have comprehensively analyzed the important experimental details that may affect the focusing quality, such as the propagation angle and polarization of the incident light relative to the DMD, the wavefront aberration originating from the DMD curvature, and the ghost diffraction orders of DMD 84 .When all these issues in experiments are addressed properly, near-perfect focusing is possible.Moreover, Tučková et al. proposed two algorithms based on the regularized iterative inversion and the regularized direct pseudo-inversion to enhance the image contrast by post-processing the obtained image with the original intensity distribution of speckle-contaminated focal spots 101 .In case the number of controllable segments on the SLM is insufficient, Zhang et al. introduced a wavelength modulation method to suppress background noise and thus improve the imaging contrast 103 .
In the aspect of signal collection, the NA of MMF is the key factor influencing imaging contrast.Studies related to increasing the NA of MMFs can be found in Section 3.1, where a detailed introduction has already been provided.In addition to these optimizations mentioned above for the most common MFSE system working in reflection mode or fluorescence imaging mode, it is also possible to combine this endoscope technique with other advanced imaging techniques, such as confocal and two-photon imaging, to further enhance the imaging contrast.Detailed explanations are provided in Section Different scanning imaging modalities.

Scanning imaging speed
As an endoscopic technique aims at biomedical applications, the imaging speed is one of the key metrics determining its practicality.Ideally, during the MMF scanning imaging process, there would be no need for repetitive calibrations.Instead, the wavefront reshaping masks recorded during the calibration stage for the generation of different focal spots would sequentially be transferred to the SLM to complete the all-optical scanning.Therefore, the imaging speed mainly depends on the refresh rate of the SLM and the communication speed between the SLM and the computer.So far, the mask transfer time between the computer and the SLM has been drastically reduced to negligible levels by utilizing a powerful modern Graphics Processing Unit (GPU) 2,104,105 .As such, the refresh rate of the SLM is currently the most critical factor limiting imaging speed.
In the early stage, LC-SLMs were widely employed owing to their excellent wavefront reshaping performance.However, the typical refresh rate of LC-SLMs is less than 200 Hz, which is too low to meet the imaging demands of moving objects.Pairing LC-SLMs with faster beam-steering devices, such as the acoustic-optic deflector and galvanometer mirrors, can markedly boost acquisition speed but considerably sacrifice the maximum pixel resolution 25 .Later, the DMD with a refresh rate exceeding 20 kHz became the favored optical field modulation device.Although DMD is a binary-amplitude modulation device, it can achieve complex modulation for both amplitude and phase using the Lee hologram method 43,46,106 .So far, the MFSE system employing a DMD as the SLM has achieved near video rate to observe the deep brain of living anesthetized mice (Fig. 7) and moving hu-mans 42,107 .Besides, DMDs also exhibit higher wavefront reshaping fidelity compared with LC-SLMs because DM-Ds do not experience the strong scattering and flicker effect that LC-SLMs encounter 102,108 .Certainly, DMD has disadvantages, such as lower diffraction efficiency compared to LC-SLMs, rendering it unsuitable for applications requiring high intensity 108,109 .In summary, DMD currently is the optimal choice for MFSE from the perspective of imaging speed.
The above analysis is based on the ideal scenario where the MMF remains undisturbed during usage.However, in most practical applications, the movement or bending of the MMF is inevitable, which renders the previously calibrated mask ineffective.The relevant studies are introduced in next Section in detail.If frequent recalibration is necessary during usage, the imaging speed will be significantly reduced.How to further boost the imaging speed in this scenario remains a significant challenge for the MFSE system.

System stability
In practical use, the stability of the entire MFSE system is of utmost importance.However, both the MMF and auxiliary optical paths within the system are susceptible to external dynamic disturbances, leading to the instability of the imaging performance.The stability of the MFSE system is primarily assessed by comparing the system's transmission characteristics of light fields before and after external disturbances.Several different parameters, including normalized focus intensity, the correlation coefficient of the MMF speckles and the structural similarity index of obtained images, have been employed to quantitatively measure the system stability.When subjected to external disturbances, the quality of the focused light spot produced by MFSE may change, thereby affecting the scanning imaging performance.Caravaca-Aguirre et al. have used normalized focus intensity to evaluate the stability of the MFSE system under displacement disturbances 110 .Besides, MFSE typically generates speckle at the output end of the MMF.It is also feasible to measure the stability of the system by calculating the correlation coefficient of speckles generated before and after external disturbances under the same light field excitation 111−113 .Furthermore, the structural similarity index of scanning imaging results obtained in different conditions was also employed to evaluate the system stability 113,114 .Next, we will discuss the effects of environmental disturbances on the MMF and its auxiliary optical paths in imaging, along with the current solutions.
It is well known that the bending, twisting, and temperature changes of the MMF can significantly alter its modal properties, making the wavefront shaping masks obtained through the calibration process no longer suitable for achieving focus in the new condition.To overcome this challenge, there are currently two alternative strategies to consider.The first strategy is to shield the MMF from external disturbances or develop MMFs that are insensitive to disturbances.For example, Papadopoulos et al. encapsulated the MMF within a 25-gauge needle to prevent the bending of MMF as displayed in Fig. 8(a) 70 .However, this method considerably diminishes the benefits of MFSE in terms of flexibility and an ultrathin probe, which are preferred in many biomedical applications.In terms of the disturbance-insensitive fibers, graded-index MMFs are widely acknowledged to offer superior resistance to external disturbances compared to step-index MMFs.At a fundamental level, external disturbances affect the scanning imaging performance of MMFs by altering the mode propagation constants and inducing mode coupling.According to the mode-coupling theory, the mode-coupling coefficients of graded-index MMFs are significantly smaller than those of step-index MMFs 115 .As such, the external disturbances induced power coupling between modes in stepindex MMFs is stronger than that in the graded-index MMFs.The weak mode coupling property of graded-index MMFs makes them more resistant to external disturbances 110 .Flaes et al. also experimentally demonstrated that the imaging performance of the graded-index MMF is indeed less affected by deformation when compared to step-index MMFs 116 .Moreover, they proved that the graded-index MMF featuring a precisely parabolic refractive index profile is highly resistant to the effect of bending.The bending deformations in such MMFs almost do not lead to mode coupling but are equivalent to slight fiber stretching.However, the fabrication of ultraprecise parabolic-index fibers is still a challenge.Further, Loterie et al. compared the propagation characteristics of different MMFs under the disturbance of bend translation and found that MMFs with small core sizes and large NAs are insensitive to such disturbance 117 .The bend in these MMFs can be translated up to 25 mm with negligible impact on the effectiveness of the original calibrated masks.While these intriguing findings enhance the operational flexibility of MMF to some extent, the demands for unrestricted manipulation of MMF in practical applications still present significant challenges.
The other strategy is to perform the real-time recalibration of the transmission matrix for MMFs.A certain progress in this aspect has been achieved through both theoretical and experimental approaches.In the theoretical aspect, Plöschner et al. established a model that can accurately derive the transmission matrix of an MMF in a bending state from the transmission matrix of the same MMF in the straight state 61 .However, this method requires real-time knowledge of the geometric conformation of MMF and thus has substantial limitations in practical application.In terms of experiments, the key challenge is how to perform recalibration using information obtained solely from the proximal end of the MMF rather than the traditional method relying on the interferometric measurement at the distal end of MMF 24 .So far, several experimental schemes have been proposed to overcome this challenge.In 2013, Farahi et al. employed a beacon source to generate speckle patterns at the proximal end of the MMF, which can be used to identify the correct transmission matrix from the pre- calibrated database for the current fiber conformation 118 .Later, schemes that involved utilizing the reflected light from the distal end of MMF to deduce its transmission matrix were presented 119−121 .As exemplified in Fig. 8(b), these schemes often require the fabrication of micro-optics components, such as partial reflectors and metasurfaces, at the distal end of the optical fiber to facilitate the measurement of the reflection matrix 119,120 .In 2021, the memory effect of MMFs was also exploited to estimate the transmission matrix at the proximal end of MMF with the assistance of a guide star placed at the distal end 52 .These techniques can achieve recalibration of the transmission matrix at the proximal end of MMFs, but some issues exist, such as slow calibration speed on the order of minutes, requirements of complex surface engineering, and limited applicable scenarios.Recently, Wen et al. proposed a technique referred to spatial-frequency tracking adaptive beacon light-fieldencoded (STABLE) endoscopy as illustrated in Fig. 8(c) to quickly recalibrate the transmission matrix 114 .The key point of this technique is forming spatial-frequency beacons at the Fourier plane of the proximal facet of MMFs with full-vector modulated incident wavefronts.By maximizing the intensity of the spatial-frequency beacon, the current fiber deformation state can be quickly determined, and the corresponding pre-calibrated transmission matrix can be sought for scanning imaging.The STABLE method achieved ~1 kHz disorder tracking frequency, ensuring stable imaging in long MMFs under fiber deformation and complex operating conditions as shown in Fig. 8(d).The STABLE method, to some extent, addressed the above-mentioned issues in other techniques, demonstrating a significant reduction in tracking time and eliminating the need for complex distal facet engineering requirements.Certainly, this method also has some restrictions on operation conditions, such as the length of MMF < 50 m, bending curvature < 80 m −1 and twisting angle < 130°.Although these restrictions have been significantly relaxed, exploring more versatile and powerful real-time calibration techniques is still of interest in this field.
In addition to the modal instability of MMF, the auxiliary optical paths also inevitably experience external disturbances, subsequently causing displacement, defocusing, and tilting between the incident light and the MMF.To address the mechanical and thermal drifts, many optimization procedures for the rapid calibration of the relative position between the optical path and MMF have been proposed 61,122 .Besides, the removal and reinsertion of MMF into the imaging system also lead to the change of relative position between the optical path and the MMF.This challenge has been overcome by Schmidt et al. using a customized headplate and the adaptive optics method 123 .Moreover, the thermal drift issue often occurs in the SLM because it needs to frequently update masks during scanning imaging.This problem can be solved by using a thermoelectric cooler to stabilize the temperature of the SLM 124 .Finally, fluctuations in environmental refractive index and temperature may also affect the imaging performance of MFSEs.The abovementioned STABLE technique can achieve high-resolution imaging in this scenario as well.Besides, calibrating MMFs in a medium with a refractive index matching the target application environment is also a alternative solution 97 .Overall, these works have effectively promoted the stability of the auxiliary optical path in the MFSE system.The key efforts to enhance the stability of MFSE systems should remain centered on mitigating the impact of external disturbances in MMFs.Finally, to intuitively exhibit the scanning imaging performance of the existing systems, imaging performance factors reported in some typical works are summarized and presented in Table 2.The unique features of these work are also provided.

Different scanning imaging modalities
These works discussed in Section Scanning imaging performance to enhance the imaging performance of the MFSE system are suitable for the most common reflection and linear fluorescence imaging modalities.In fact, to achieve further enhancement in the overall imaging performance, other advanced imaging modalities working in the scanning imaging manner can also be integrated with the MFSE to leverage the advantages of different technologies.A prominent advantage of MFSE is its capability to access and image tissues deep inside the body with minimal invasion, whereas other imaging modalities can provide high imaging contrast, excellent optical sectioning ability, and so on.For example, if the confocal imaging technique is employed to modify the signal acquisition module of the MFSE system, the imaging contrast and resolution of the MFSE can be improved.So far, MFSE has been successfully integrated with various imaging modalities, including confocal endoscopy, two-photon imaging, Raman imaging, second harmonic imaging, and photoacoustic imaging.These integrations demonstrate the remarkable adaptability and versatility of MFSE.Next, we will introduce the research progress related to the integration of MFSE with these imaging modalities in detail.

Confocal endoscopy
Confocal microscopy is a widely used scanning imaging technique, which offers superior image contrast and axial resolution compared to conventional wide-field microscopy 100,125,126 .The excellent imaging performance of confocal microscopy is attributed to the spatial filtering effect of a pinhole introduced in front of the photodetector.The pinhole is arranged in a conjugate plane with respect to the focal plane of the lens system.When the light beam is reflected from the sample and reaches the pinhole, only the light signals originating from the focal volume of the sample can pass through, while stray light around the focal point and out-of-focus light are filtered out.It is not feasible to directly transplant the confocal microscope arrangement into the MFSE system because of the modal scrambling properties of MMFs.When the light signals originating from the focal point are gathered and transmitted through an MMF, they exhibit specklelike profiles.Regarding these speckles, it is impossible to form a conjugate focal plane for spatial filtering using a conventional lens system.
To overcome this challenge and enable the confocal imaging modality of MFSE, in 2015, Loterie et al. presented a solution based on an ingenious experimental setup.
The key point of this solution is to modulate the signal light returning from the MMF through an SLM with the same wavefront shaping mask that generates the current illumination focal spot at the distal end of the MMF 127 .In accordance with the reversibility of the wave equation, if the returning signal light originates from the current fo-cal spot generated by the MMF, then after modulation through the SLM, the signal light is in phase-conjugation with the incident illumination light on the SLM and is also a plane wave.The modulated signal light, following transformation by a lens, is able to pass through the pinhole positioned in the optical path without obstruction.Conversely, when the returning light originating from the stray light is modulated by the current mask, it cannot be phase-conjugated to the illumination light and thus is filtered out by the subsequent pinhole in the optical path.As such, the flexible utilization of the SLM enables MFSE to enhance its scanning imaging performance following the principle of confocal imaging modality.The scanning imaging ability of the confocal MFSE and the common reflection-mode MFSE is compared by employing 30 μm and 2 mm pinholes.The 2 mm pinhole has a limited spatial filtering effect and thereby is used to represent the situation of reflection-mode MFSE.These imaging results indicate that when MFSE operates in a confocal modality, the imaging contrast and resolution are significantly improved.
In addition to the SLM-based experimental implementation of the confocal MFSE, in 2015, Loterie et al. proposed another approach based on the digital confocal principle to realize the confocal imaging modality of MFSE 99 .A brief outline of the process to implement the digital confocal method is as follows.Firstly, the complex optical field profile of the returning light output from the proximal end of the MMF is reconstructed using digital off-axis holography.Secondly, the reconstructed optical field is multiplied by the transmission matrix of the MMF to simulate the propagation of light within the MMF.This calculation allows us to reconstruct the optical field profile on the sample plane.Thirdly, a digital pinhole mask is used to selectively preserve the optical field within a range around the focal spot.The range is determined by the size of the digital pinhole.The total energy of the optical field in the preserved region represents the intensity value of a pixel in the image.Repeating this process during the scanning imaging procedure allows for the synthesis of an enhanced image.In addition to the digital confocal method, another correlation method is also employed to enhance image quality.According to the reversibility of light propagation, if the returning light that originates from the focal spot propagates back through the MMF, it should lead to a similar optical field as the input.On the contrary, the optical field originating from the region beyond the focal spot should be uncorrelated with the input optical field due to the modal scrambling properties of MMF.As such, the intensity value of each pixel in an image can be obtained by calculating the correlation between the reconstructed optical field of the returning light and the original illumination light at the proximal end of the MMF.The experimental results demonstrated that both the digital confocal method and the correlation method can enhance image contrast and resolution.
The correlation method has a lower computational cost while the digital confocal approach is more flexible in terms of image processing.Recently, Singh et al. demonstrated that using multiple virtual pinholes to process the reconstructed optical field on the sample plane can further improve the imaging contrast to surpass that achieved with a single digital pinhole scheme 128,129 .The integration of MFSE with confocal fluorescence imaging is also theoretically studied 130 .According to the theoretical studies, the integrated setup should employ two SLMs to modulate the illumination light and fluorescence, respectively.Since the spectrum of fluorescence is generally broad, the bandwidth of the SLM and the dispersion of the MMF are key issues that need to be solved.So far, there are no reported experimental implementations of the confocal fluorescence MFSE.Overall, compared to the conventional reflective MF-SE, hardware-based confocal MFSE provides enhanced image contrast and improved resolution while maintaining the same imaging speed.Owing to the spatial filtering function, the confocal MFSE is capable of 3D scanning imaging.The digital confocal MFSE also exhibits these enhancements, but due to the need for data processing, it has a slightly slower imaging speed.Besides, in the hardware-based confocal MFSE, aberrations induced by the curved SLM surface are automatically corrected because the light is modulated by the SLM twice.The digital confocal MFSE is flexible, while the hardware-based confocal MFSE also offers some unique advantages.

Two-photon imaging
The two-photon fluorescence imaging technique that relies on the nonlinear two-photon excitation effect forms a significant branch within the realm of conventional scanning microscopy 131 .Two-photon excitation is the process in which fluorescent molecules simultaneously absorb two photons with a longer wavelength than that required for single-photon excitation, leading to the electronic transition to an excited state and the subsequent fluorescence emission.The fluorescence signal intensity in the two-photon excitation is proportional to the square of the excitation light's amplitude.Therefore, if a focused light spot is used for two-photon fluorescence excitation, the fluorescence signal will be concentrated within a smaller volume than the focused spot itself, which can significantly improve the resolution and signal-to-noise ratio of fluorescence imaging.Besides, due to the longer excitation wavelength in two-photon imaging, this technique offers better penetration of biological tissues than single-photon fluorescence imaging 132 .Despite these advantages, the realization of highly efficient two-photon absorption requires high photon density conditions, which are usually achievable with the use of femtosecond pulsed lasers.To combine the twophoton imaging modality with MFSE, it is necessary to achieve the spatiotemporal focusing of the femtosecond pulse laser with broad spectra using the MMF.However, this task is challenging because the modal and chromatic dispersion of MMF can lead to a speckled spatial profile of the optical field and temporal pulse broadening 133,134 .The wavefront shaping schemes mentioned earlier are mainly designed for the monochromatic case.When it comes to focusing femtosecond pulse laser, additional measures are needed to deal with the complex dispersion issues within the MMF.
In 2015, Morales-Delgado et al. proposed a time-gated digital phase conjugation scheme to focus the femtosecond laser through the MMF 135 .In the calibration, they used time-gated interferometry and off-axis holography to record the complex optical field of the laser pulse output from an MMF.The recorded information enables them to selectively reconstruct the phase-conjugated optical field corresponding to a set of modes with similar group velocities.When the reconstructed optical field is coupled into the MMF, only the selected set of modes with negligible modal dispersion is excited in the MMF.These modes can generate a focal spot with minimal temporal broadening in the distal end of MMF.The quality of the generated focal spot is not high due to the utilization of only a portion of the eigenmodes.Since graded-index MMFs have a greater number of eigenmodes with similar group velocities, they are a superior choice for two-photon imaging compared to step-index MMFs.In the same year, they demonstrated that the focal spots generated with this method can be exploited for two-photon scanning imaging 136 .In the experiment, they use an additional prism pair to compensate for the group velocity dispersion.The time-gated mode selection method, in conjunction with this dispersion compensation measure, achieved the transmission and focusing of nearly Fourier-limited optical pulses.The output pulse has a duration of 120 fs, which is close to the initial pulse duration 100 fs.The two-photon scanning imaging experiment utilizing these focused laser pulses achieves ~1 μm and ~15 μm lateral and axial resolutions, respectively, which are smaller than the resolution of singlephoton fluorescence imaging.They also conducted the scanning imaging experiment within a 3D volume to demonstrate the depth sectioning ability of the twophoton MFSE, which is a significant advantage of the two-photon MFSE compared to the linear fluorescence imaging modality of the MFSE.
Later, Pikálek et al. systematically investigated the impact of the bandwidth of both SLM and MMF on the focusing capability.Two dispersion compensation schemes, including the SLM correction and the prism correction, were provided to maximize the system's bandwidth 137,138 .When a continuous laser with the same wavelength as the central wavelength of the femtosecond pulse is used for calibrating the wavefront shaping mask, it is also possible to achieve the focusing of the femtosecond pulse without strict dispersion management.Compared to the previously mentioned approach, the quality of the focal spot obtained here is lower and the pulse duration is longer.Turcotte et al. demonstrated that focal spots produced by this approach can also be used for volumetric two-photon fluorescence imaging in living brain tissue 139 .Furthermore, Sivankutty et al. proved that when extremely short MMFs, typically spanning just a few centimeters, were used in the MFSE, the effects of dispersion on pulse broadening and femto-second laser focusing can be safely neglected 140 .In such scenarios, the MMF can be directly applied to twophoton imaging without the specialized dispersion management.
In addition to the wavefront shaping method, in 2021, Velsink et al. also proposed a scheme based on the temporal shaping of pulses to achieve the two-photon MFSE 141 .When the femtosecond pulse is shaped in the temporal domain, the pulse duration is only short in a predefined area of the distal end of MMF, while the pulses in the other region are broadened.This enables enhanced two-photon excitation only in a predefined region.They achieved two-photon raster scanning imaging in a square-core step-index MMF using this method.Furthermore, considering the complex measures required to address dispersion issues in two-photon imaging, Laporte et al. suggested an alternative nonlinear imaging method, namely saturated excitation, to improve the fluorescence imaging resolution of MFSE 98 .The saturated excitation method only requires a narrowband continuous laser, which helps to avoid dispersion issues.Experimental results have shown that the resolution of MFSE based on saturated excitation is enhanced by 1.6 times in all three dimensions compared to the linear fluorescence imaging modality of the MFSE.
Overall, the current two-photon MFSE is not yet mature, and some pressing issues need to be addressed.On the one hand, the quality of the focal spots obtained with these mentioned schemes is relatively poor.As a result, the resolution advantage of two-photon imaging is partially offset by the relatively poor focusing quality.On the other hand, to achieve relatively high excitation power, an LC-SLM with high diffraction efficiency was necessary for wavefront shaping.However, this resulted in slow imaging speed.Therefore, the further advancement of the two-photon MFSE necessitates overcoming these challenges.

Raman imaging
Raman scattering is an inelastic scattering process that occurs when photons interact with molecules in matter.During this process, photons can either lose or gain vibrational energy from the molecules, resulting in a shift of the photons' wavelength toward longer or shorter wavelengths, which are well known as the Stokes shifts and anti-Stokes shifts, respectively.These wavelength shifts are associated with the vibrational transitions of chemical bonds within molecules.Therefore, Raman scattering spectra can be used to identify chemical compounds, which finds extensive applications in pharmaceutical analysis, biological process detection, disease diagnosis and so on.Fiber probes are crucial tools for Raman spectroscopic imaging.The miniaturization of probes enables many new applications and has been an ongoing focus for researchers.With the advancement of MFSE, MMFs have rapidly emerged as highly promising candidates for Raman imaging probes due to their extremely small footprint.So far, both the spontaneous Raman scattering imaging and the coherent anti-Stokes Raman scattering (CARS) imaging techniques have been combined with the MFSE to achieve label-free endoscopic imaging with high chemical contrast.
In 2017, Gusachenko et al. first employed MFSE for Raman spectroscopic imaging by replacing the photodetector in the common reflection imaging modality with the spectral measurement module as shown in Fig. 9(a) 142 .A continuous laser with 532 nm wavelength was focused by the MMF as the excitation beam to induce Raman scattering on the sample pixel-by-pixel.The spectra measured at each pixel are a result of the superposition of Raman scattering spectra from different compounds.Therefore, they utilized a non-negative matrix factorization method to decompose the acquired spectral data and calculate the contribution weights of each compound to the overall spectrum.Then, the weight values of compounds at each pixel serve as the intensity of the image to generate the final Raman spectral image.To display the distribution of different compounds within the same image, several color channels are typically used to represent distinct compounds.Figure 9(b) displays the spectral components decomposed for four different pharmaceuticals.These pharmaceuticals cannot be distinguished in the bright-field image while their distributions can be well identified in the Raman image with chemical contrast as shown in Fig. 9(c).This result demonstrates the unique advantages of MFSE working in Raman imaging modality over other imaging modalities in compound analysis.Besides, theoretical analysis indicates that the light-focusing ability of MMF probes based on the wavefront shaping technique enhances Raman detection signal-to-noise ratio by more than three orders of magnitude compared to speckle illumination.Later, the Raman imaging MFSE was also exploited to acquire the distribution of compounds in optically cleared samples 143 .
Despite the substantial potential of Raman imaging with MFSE, it has suffered from a strong Raman background noise from the silica material of MMF and extremely slow imaging speed.The background signal originating from the silica MMF covers a wide wavenumber range (<500-1700 cm −1 ), which corresponds to the Raman shifts of many substances that need to be identified in clinical applications.To solve this problem, Deng et al. proposed using a sapphire MMF as a substitute for the silica MMF in Raman imaging 144 .The sapphire MMF itself only has several sharp Raman peaks and thereby allows highly efficient detection and imaging in a wavenumber range of >800 cm −1 .However, this solution is imperfect because sapphire MMFs lack cladding, causing their optical transmission properties to fluctuate with environmental changes and thus resulting in a degradation of scanning imaging performance.In terms of the slow imaging speed, it is primarily constrained by the extremely weak spontaneous Raman signals, requiring a spectrometer with typically several seconds to acquire data from a single scanning point.Coherent anti-Stokes Raman scattering (CARS) is an effective approach to enhance the Raman signal and thereby significantly reduce the signal acquisition time 145 .In 2019, Trägårdh et al. successfully implemented the CARS imaging through the MFSE and achieved ~1 ms integration time for Raman signal acquisition in the epi-detection condition, which is several orders faster than the spontaneous Raman imaging 146 .In the experiment, two synchronized pump and Stokes pulses, after wavefront shaping by the same SLM, were focused onto the sample using the MMF.To acquire different Raman shifts, the wavelength of the pump laser should be tuned in a range.The use of laser pulse and tuning wavelengths of the pump beam both pose dispersion issues, which were solved by properly selecting graded-index MMF and calibrating the signal with the corresponding intensities of focal spots.However, when the graded-index MMF is used for CARS imaging, an unwanted four-wave mixing background signal exists due to the self-imaging of the focal spot inside the graded-index MMF.In 2022, Pikálek et al. presented an MMF probe made by splicing a segment of step-index MMF onto the graded-index MMF 147 .This special probe can prevent the focal point from forming inside the graded-index MMF, thereby significantly reducing the four-wave mixing background and improving the imaging contrast.Furthermore, another nonlinear imaging modality, i.e. the second harmonic imaging, has also been integrated with MFSE.In 2021, Cifuentes et al.
implemented polarization-resolved second harmonic imaging through MFSE with the same epi-detection manner 148 .By utilizing this imaging modality, they successfully discerned structural proteins deep within the mouse tail tendon and heart tissue.Both the Raman imaging and second harmonic imaging modalities are labelfree techniques, which are important complements to fluorescence-based imaging methods.The successful implementation of the two imaging modalities with the MFSE greatly expands its versatility.

Photoacoustic endoscopy
Photoacoustic imaging is a widely used hybrid imaging modality 149 , which merges the principles and advantages of optical excitation and ultrasound detection to achieve high-resolution images of objects.In photoacoustic imaging technique, a nanosecond pulse laser with a wavelength in the high absorption band of the sample is used for illumination.The temperature of the region illuminated by the pulse laser changes, leading to the generation of acoustic signals based on the thermoelastic ef-  single-mode fiber with a plano-concave polymer microresonator on its tip.In experiments, a nanosecond pulse laser was modulated by an LC-SLM using the masks obtained by the transmission matrix measurement method.Then, focal spots at different positions are sequentially produced and raster-scanned over a sample at the distal end of the MMF.The ultrasound signals acquired at each scanning point can be used to reconstruct the image.The lateral resolution of photoacoustic MFSE depends on the size of focused light spots and thus can be very small.However, due to the use of an LC-SLM for wavefront shaping, the imaging speed of their system was very slow.In 2022, Zhao et al. achieved a video-rate photoacoustic MFSE using a DMD as the SLM 151 .Masks employed for wavefront shaping were acquired by a realvalued intensity transmission matrix algorithm.The imaging speed and spatial resolution of this setup varied from ~2 to 57 frames per second and from 1.7 to 3 μm, which were determined by the scanning step size.This work employed a transmission detection manner with a bulky ultrasound detector, which hindered its practical usage.In the same year, they modified their setup by integrating a highly sensitive fiber-optic microresonator ultrasound sensor with the MMF probe within a 20 gauge medical needle while retaining the DMD for highspeed wavefront shaping 152 .In another work, they integrated a transparent ultrasound sensor based on an ITOcoated PVDF thin film on the distal end of MMF 153 .They also demonstrated the 3D imaging ability of the photoacoustic MFSE by both acoustic and optical sectioning.The axial resolution of optical sectioning is ~25 μm, which is almost half smaller than that of acoustic sectioning.As a scanning imaging technique, the single-frame imaging time of the photoacoustic MFSE depends on the scanning speed and the number of scanning points included in an image.When the scanning speed is constrained by the refresh rate of SLM, reducing the pixel number in an image is a necessary approach to boost imaging speed.To reduce the number of scanning points without compromising image resolution, Zhao et al. introduced a method that utilizes a deep convolutional neural network to reconstruct the high-resolution image from the sparse-sampled photoacoustic imaging result 154 .They validated that this method can boost imaging speed by 16 times without substantial degradation of image quality.Finally, to intuitively compare the different imaging modalities of MFSE, the contents of some representative works studying these imaging modalities are summarized in Table 3.In terms of the experimental setups, different imaging modalities have varied requirements for light sources, detectors, and SLMs.The nonlinear imaging modalities, including two-photon imaging, spontaneous Raman imaging and CARS imaging, commonly require high peak intensity and total power for excitation, and thus the ultrashort pulse laser and the LC-SLM with high diffraction efficiency are employed.The slow imaging speed is a common problem for these imaging modalities due to the usage of LC-SLM.The long signal acquisition time of spontaneous Raman imaging is the dominant factor for its slow imaging speed.To note, the problems listed in the table only outline issues related to employing MMFs across different modalities rather than the inherent characteristics of the modalities.In another aspect, the confocal, two-photon and photoacoustic imaging modalities of MFSE all have been demonstrated to possess 3D imaging capacity, while the Raman and CARS imaging modalities can identify chemical compounds.Therefore, diverse imaging modalities greatly expand the application scenarios of the MFSE, which are difficult for the common reflection or fluorescence imaging mode of the MFSE.Notably, as these MMF-based imaging modalities are still in the developmental stage, applications presented in the literature are mainly focused on ex vivo imaging of samples.To promote the practical applications of these MMF-based imaging methods, significant efforts are still required to address the issues listed in the table.

Applications
Over the past decade, significant progress has been made in improving the MFSE system.However, we have to acknowledge that this technique is still in development and is not yet operational in practical applications.Nevertheless, some studies have begun to attempt the implementation of this technology in practical settings.Besides, many research works that aimed at improving the imaging performance of the MFSE and expanding its imaging modalities have included some application demonstrations.Next, we will provide an introduction to these application demonstrations of MFSEs, which mainly include biomedical imaging, 3D volume imaging and compound composition analysis.
Biomedical imaging is the primary target application field of MFSEs.The greatest advantage of MFSEs lies in their extremely small footprints, which promise minimal invasion in biological tissue during imaging.Besides, MFSEs can achieve endoscopic imaging for narrow tubular structures that cannot be accessed by the other endoscopes.In the early stages, biomedical imaging demonstrations primarily focused on some ex vivo cellular specimens, such as stained neuronal cells, human epithelial cells, CHO cells, and mouse red blood cells 70,99,140,152 .With the improvement in imaging speed, researchers began conducting in vivo imaging of live cells.In 2018, Ohayon et al. employed the MFSE to capture the processes of baby hamster kidney cells expressing GFP and rapid neuronal dynamics in vivo 155 .Subsequently, Turtaev et al. and Vasquez-Lopez et al. performed in vivo endoscopic imaging experiments on live animals using anaesthetized mice 2,42 .They utilized MFSE to observe subcellular neuronal structures, dendrites, and synapses in the deep layers of the mouse brain, while also monitoring the stimulus-driven Ca 2+ responses in neurons.In addition to imaging the deep brain tissues with strong scattering characteristics, Wen et al. performed in vivo endoscopic imaging of tubular organs of animals, including the sheep's small intestine and pig esophagus 114 .They also conducted endoscopic imaging in lived mice and performed disease diagnosis on their colon tissues.In medical applications, endoscopes not only aid in the diagnosis of diseases but also can guide surgical interventions for treatment.Kakkava et al. have employed the two-photon MFSE to guide femtosecond laser ablation in biological samples for highly selective modifications, which demonstrates the potential applications of MFSE in assisting medical surgeries 156 .In biomedical imaging, the crucial technical parameters include imaging speed and the stability of the system.Some technological improvements, such as the use of DMD and STABLE technique, have elevated these two parameters, consequently enabling MFSEs to achieve the above advancements in biomedical imaging applications.
As a scanning imaging technique, MFSE can easily extend from two-dimensional imaging to 3D imaging scenarios.Even with common bright-field or fluorescence imaging modalities, 3D imaging can also be achieved by focusing the light field at various positions in the 3D space.So far, 3D volume imaging has been demonstrated not only in fluorescence microsphere samples but also in live animal models 114,122,157 .For example, Stibůrek et al. utilized the MFSE to conduct in-vivo volumetric imaging of the mouse brain and recorded the volumetric data of sub-cellular structures' dynamics 157 .In 3D imaging applications, a crucial technical parameter is the axial resolution.If MFSE adopts confocal and twophoton imaging modalities, the axial resolution can be significantly improved as we have discussed in Sections Confocal endoscopy and Two-photon imaging 99,140 .Moreover, when performing the volumetric imaging of tissues with strong optical scattering, imaging depth is also a crucial parameter.In such cases, the photoacoustic imaging modality of MFSE is preferred because its acoustic sectioning function can increase the imaging depth 152 .
Furthermore, MFSE also can be used for composition

Challenge and outlook
Although there have been significant advancements in improving the scanning imaging performance and expanding application fields of the MFSE, this imaging technology still encounters several daunting challenges.
Here, we do not refer to challenges only faced by a specific imaging modality of MFSE, such as the dispersion issue in two-photon imaging, which have been introduced in the corresponding sections.Conversely, some common issues faced by MFSE across various imaging modalities are discussed, along with the potential remedies.From our perspective, four primary problems need to be addressed.The first problem is the well-known modal instability of MMF under dynamic disturbances, such as the distortion and bending of MMF, mechanical stress, temperature fluctuation, etc.In Section System stability, we have reviewed the progress made by researchers in tackling this issue, which mainly involves developing MMFs with better disturbance-resistant ability, recalibrating transmission matrices by theoretical methods, and real-time recalibrating transmission matrices at the proximal end of MMF.However, these methods all have one or more of the following limitations: (i) stringent application conditions, (ii) slow processing speed, (iii) unsuitability for implementation during the in-situ imaging process, and (iv) involving complex engineering requirements and resulting in inadequate experimental resolution.The current state-of-the-art recalibration solution, while overcoming most of the aforementioned issues, still imposes a few restrictions on operation conditions 114 .Therefore, exploring more versatile and powerful approaches to promote the stability of MFSE is still of interest in this field.From our perspective, the most elegant solution is to start at the optical fiber device level and manufacture optical fibers that are extremely insensitive to disturbances.For example, we can develop ultraprecise parabolic-index optical fibers or modified disordered optical fibers taking advantage of the Anderson localization effect for robust image transmission 116,158 .Alternatively, we may need to synergistically combine different disturbance-resistant strategies to provide a robust imaging method that allows flexible MMFs to be used.
The second problem is the relatively slow imaging speed to acquire high-resolution images.As an imaging technique based on the point-scanning principle, the imaging resolution of MFSE is constrained by the size of the focal spot and the number of scanning points used to form an image.To achieve high image resolution, a large number of scanning points is required, which results in a reduction in imaging speed.Even if the currently available SLM with the fastest refresh rate is employed for wavefront shaping, the point-scanning speed is only ~23 kHz, which corresponds to a slow frame rate of ~2.3 fps (100×100 pixels).This outcome still falls short of the requirements for video-rate endoscopic imaging.Given the current hardware constraints, reducing the number of scanning points is essential to boost imaging speed.Certainly, we may sacrifice the FOV to simultaneously reduce the number of scanning points and maintain high imaging resolution.However, a large FOV is also very important for many applications.Another potential solution is to train a machine learning model to reconstruct the high-resolution image from the low-sampled image created with only a few scanning points 159,160 .This approach holds promise in simultaneously achieving fast imaging speed, high resolution, and a large FOV.The reliability and generalization ability of machine learning models are aspects that require in-depth study in this scheme.Notably, medical endoscopic imaging is one of the potential application areas of MFSE, where the imaging targets are relatively fixed for the diagnosis of a specific disease.Therefore, the demands on the generalization ability of machine learning models are not very stringent in this scenario.As such, machine learning solutions hold significant promise to promote the imaging speed of the MFSE.
The third challenge involves the trade-off between the working distance and imaging resolution.This is a fundamental limitation determined by the aperture sizes of MMFs.As we have discussed in Section Imaging resolution, so far, there are still no effective approaches to simultaneously achieve excellent imaging resolution and long working distances 91,92 .Hence, it is necessary to introduce some new technologies, such as machine learning and synthetic aperture techniques, to overcome this challenge according to concrete application scenarios.The last challenge, but also a prospect, is how to develop a miniaturized and integrated MFSE system to enhance the system's stability.The current optical configuration of MFSE is very complex.Especially, when full modulations on phase, amplitude and polarization of optical field are carried out to achieve high-quality focal spots, a large number of optical components are employed and all of them need to be precisely aligned.This sophisticated bulk optical setup is inherently challenging to be isolated from external disturbances, thereby leading to potential misalignment issues among myriad optical elements as we have reviewed in Section System stability.In the commercialization of products, it is not feasible to expect customers to master complicated system alignment methods proposed by researchers in the literature.In this scenario, a potential solution is to develop a miniaturized and integrated optical system for MFSE using photonic integrated chips.The wavefront shaping function, which is a key challenge of this idea, may be achieved through the combination of dense array waveguides, phase shifters, and optical steering techniques.Anyway, the feasibility of this scheme requires collaborative validation from experts in different research fields.

Conclusions
MFSE is a highly promising endoscopic imaging system that has the potential to replace single-mode fiber and fiber bundle endoscopes due to its ultrathin footprint and high information transmission density.Over the past decades, MFSE has attracted extensive research interest and achieved substantial advancements in improving imaging performance and extending its applications.In this review, we first introduced the wavefront shaping principle to generate focal spots through MMF and procedures to implement all-optical scanning imaging without using mechanical actuators.Developments of three techniques, including transmission matrix measurement method, digital phase conjugation method and phase optimization algorithms, to acquire the mask for wavefront shaping with the SLM are reviewed in detail.Then, we focus on the scanning imaging performance of MFSE and strategies employed to enhance these performance metrics.In addition to some versatile improvements, combinations of some advanced imaging modalities, including confocal endoscopy, two-photon imaging, Raman imaging, and photoacoustic imaging modalities, with the MFSE systems to enhance the imaging performance for specific application scenarios are also presented.The applications of MFSEs that have been demonstrated in literature are introduced.Finally, we summarize the challenges faced by the current MFSE and the potential solutions.Looking forward to the future, if the mentioned problems in the current MFSE system can be overcome, MFSE could initiate its journey toward commercialization.Owing to certain trade-offs among various imaging performance metrics, it is possible to optimize the most crucial performance indicators based on specific application scenarios.The successful commercialization of MFSE is certain to substantially drive the progress of diverse application fields spanning biology, medicine, and industry.

Fig. 1 |
Fig. 1 | Schematic of MFSE based on wavefront shaping technique.The upper part illustrates three technique routes to acquire appropriate masks for wavefront shaping, aiming at generating high-quality focal spots at the output end of MMF for scanning imaging.The bottom two boxes illustrate the developments of MFSE, including improving imaging performance and combining the MFSE with diverse advanced imaging modalities.

Fig. 2 |
Fig. 2 | Schematic of optical fields propagating through an MMF.Due to the modal dispersion and mode coupling of the MMF, the input optical field with an ordered profile is converted into a speckled output optical field.ΔβL represents the phase differences between different eigenmodes caused by the intermodal dispersion.

Fig. 3 |
Fig. 3 | (a) Schematic to illustrate the definition of the transmission matrix of an MMF.Schematics of experimental setups to measure the transmission matrix of MMF or disordered media using (b) internal and (c) external references.(d) Schematic of the experimental setup to measure the transmission matrix through the spatial frequency domain method.Figure reproduced with permission from: (b) ref.41 , American Physical So-

Fig. 4 |
Fig. 4 | (a) Schematic of the experimental setup to implement digital phase conjugation through an MMF, (b) illustration of four steps to realize light focusing through an MMF based on the digital phase conjugation principle.Figure (a) reproduced with permission from ref. 22 , © Optica Publishing Group.

Fig. 5 |
Fig. 5 | Principle of the interference-based optimization method.Optimization procedures based on (a−d) internal and (e−g) external reference pathways, respectively.When the external reference pathway is exploited, the effect of polarization on the light focusing can be comprehensively analyzed.Figure reproduced with permission from ref. 23 , © Optica Publishing Group.

Fig. 6 |
Fig. 6 | (a) Experimental setup of MFSE works in an all-optical scanning imaging manner.L1-L4: lens, A1: aperture, PD: photodetector, B1: beam splitter or dichroic mirror for reflected light or fluorescence, respectively.(b) Procedures to conduct all-optical scanning imaging, (c) combined image of sequentially generated focal spots at the distal end of MMF and (d) scanning imaging result of a test target.Figures (c) and (d) reproduced with permission from ref. 42 , Springer Nature, under a Creative Commons Attribution 4.0 International License.

Fig. 7 |
Fig. 7 | Images of (a) neuronal somata and (b) neuronal processes in the brain of a mouse, as well as (c) a haemorrhage in the primary visual cortex were captured by the MFSE in vivo with an imaging speed of 3.5 frames/s.Figure reproduced with permission from ref. 42 , Springer Nature, under a Creative Commons Attribution 4.0 International License.

Fig. 8 |
Fig. 8 | (a) The image of an MMF encapsulated within a 25-gauge needle to prevent the deformation of MMF.(b) Schematic to illustrate the calibration of the transmission matrix at the proximal end of MMF based on the reflection matrix scheme.A metasurface reflector is added at the distal end of MMF to facilitate the measurements of reflection matrices for different wavelengths.(c) Schematic of spatial-frequency tracking adaptive beacon light-field-encoded endoscopy to detect and track disorders caused by movements and manufacturing defects.(d) The geometric state change and the corresponding imaging recovery using the STABLE tracking method.Figure reproduced with permission from: (a) ref. 70 , © Optica Publishing Group, (b) ref. 120 , American Physical Society, under a Creative Commons Attribution 4.0 International License, (c) and (d) ref. 114 , Springer Nature, under a Creative Commons Attribution 4.0 International License.
fect.These acoustic signals can be detected by ultrasound sensors and used to create images.Photoacoustic imaging based on the optical absorption effect can be carried out either in a label-free manner or with the addition of exogenous agents to enhance contrast.Besides, the photoacoustic imaging modality allows for 3D imaging through acoustic sectioning using time-resolved acoustic signals or optical sectioning with focal spots at varying depths.Nevertheless, the two sectioning approaches both have limited depth of penetration.To diagnose lesions deep within tissues or tubular organs, photoacoustic endoscopy with miniaturized probes is a favored solution.It is clear that the MFSE using an ultrathin MMF probe and the photoacoustic imaging technique are mutually compatible and offer complementary advantages.Recently, some researchers have combined these two techniques to achieve photoacoustic MFSE.In 2020, Mezil et al. implemented the photoacoustic MFSE by placing a fiber-optic ultrasound sensor with the MMF probe side-by-side150 .The ultrasound sensor consists of a

Fig. 9 |
Fig. 9 | (a) Experimental setup of Raman spectroscopic imaging with the MFSE.(b) Raman spectral components of four pharmaceuticals.(c) Images of drug clusters obtained by bright field (left) and Raman spectroscopic (right) imaging modes.Figure reproduced from ref. 142 , under the terms of the Creative Commons Attribution 4.0 License, Optica Publication Group.Some cropping adjustments have been made to the original image.

Table 1 | Focusing performance of MMFs based on different techniques.
L: length of MMF, D: core diameter of MMF, λ: wavelength of light.Wu GX etal.Opto-Electron Sci 3, 230041 (2024) https://doi.org/10.29026/oes.2024.230041230041-11 fiber.According to the data presented in the table, the sizes of focal spots produced by different techniques under various experimental conditions exhibit some differences, but they are not particularly significant.The differences in the focusing performance based on different techniques are primarily reflected in the contrast of focal spots.These techniques based on interferometric measurements, such as TM-I, PC, PC-R, and PO-I, can generate focal spots with significantly higher contrast compared to those acquired by other non-interferometric methods.These results indicate that techniques based on interferometric measurements can more accurately capture the modal scrambling properties of MMFs, thus enabling more effective wavefront shaping to generate focal spots.This is the reason that calibration techniques based on interferometric measurements remain the mainstream methods to obtain wavefront shaping masks.

Table 3 | Comparison of different imaging modalities of MFSE
148 numerical aperture of MMF; P: detector; S: light source; SK: CARS Stokes beam; PMT: Photomultiplier tube; FOUS: fiber-optic ultrasound sensor; R l and R a are the lateral and axial resolution, respectively; O: optical sectioning; A: acoustic sectioning; FWM: fourwave mixing.analysis of chemical compounds or biological substances by employing Raman imaging modalities, CARS imaging modalities, and second harmonic generation imaging modalities.For example, Gusachenko et al. demonstrated using Raman imaging modalities of MFSE to identify the spatial distribution of different pharmaceuticals in samples142.Cifuentes et al. successfully employed the polarization-resolved second harmonic generation imaging modality of MFSE to characterize different structural proteins in the mouse tail tendon and heart tissue148.Since these imaging modalities are label-free imaging techniques based on optical nonlinear effects, the focused light pulses generated by the MMF must have sufficient peak power to obtain strong imaging signals.Therefore, effective dispersion management is crucial in these applications.