[PDF] Optimal birefringence distributions for star test polarimetry

Abstract

Star test polarimetry is an imaging polarimetry technique in which an element with spatially-varying birefringence is placed in the pupil plane to encode polarization information into the point-spread function (PSF) of an imaging system. In this work, a variational calculation is performed to find the optimal birefringence distribution that effectively encodes polarization information while producing the smallest possible PSF, thus maximizing the resolution for imaging polarimetry. This optimal solution is found to be nearly equivalent to the birefringence distribution that results from a glass window being subjected to three uniformly spaced stress points at its edges, which has been used in previous star test polarimetry setups.

Full PDF

OOptimal birefringence distributions for star testpolarimetry A NTHONY V ELLA AND M IGUEL

A. A

LONSO The Institute of Optics, University of Rochester, Rochester NY 14627, USA Aix Marseille Univ, CNRS, Centrale Marseille, Institut Fresnel, UMR 7249, 13397 Marseille Cedex 20,France * [email protected] Abstract:

Star test polarimetry is an imaging polarimetry technique in which an element withspatially-varying birefringence is placed in the pupil plane to encode polarization informationinto the point-spread function (PSF) of an imaging system. In this work, a variational calculationis performed to ﬁnd the optimal birefringence distribution that eﬀectively encodes polarizationinformation while producing the smallest possible PSF, thus maximizing the resolution forimaging polarimetry. This optimal solution is found to be nearly equivalent to the birefringencedistribution that results from a glass window being subjected to three uniformly spaced stresspoints at its edges, which has been used in previous star test polarimetry setups. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Polarimetry is the measurement of the polarization state of light and/or the polarization propertiesof materials. Such measurements are usually characterized in terms of the Stokes parametersand the Mueller matrix, respectively, which are directly accessible from measurements of theintensity. Imaging polarimetry, in which the polarization is measured as a function of position, isparticularly important in applications ranging from microscopy to remote sensing.Conventional techniques for Stokes polarimetry require multiple intensity measurements, eitherthrough time-sequencing or by splitting diﬀerent polarization components into several separatedetection channels [1, 2]. For example, one common method uses a rotating quarter-wave plate(QWP) followed by a ﬁxed linear polarizer, in which the Stokes parameters are deduced fromsuccessive intensity measurements with the QWP oriented at diﬀerent angles [3]. While thesetechniques can produce highly accurate measurements, they can be relatively complicated and/ortime-consuming, generally involving moving parts or multiple beam paths.When a short acquisition time is desirable, rapid polarization measurements may be takenusing single-shot polarimetry , in which the Stokes parameters are estimated from a singleintensity measurement. A variety of methods exist for single-shot polarimetry, involving gradientindex lenses [4], patterned nanoscale gratings [4], or a split aperture composed of multiplepolarizers [5]. One particularly simple method, referred to as star test polarimetry , uses aspatially-varying birefringent mask (BM) followed by a uniform polarization analyzer in thepupil plane of an exit-telecentric imaging system. With an appropriately chosen birefringencedistribution, the inserted elements can encode full polarization information into the shape ofthe PSF in the rear focal plane of the lens. This method has been demonstrated experimentallyusing a stress-engineered optic (SEO), which is a BK7 glass window subjected to stress withtrigonal symmetry at its periphery, followed by a circular analyzer [6, 7]. Natural applicationsfor this approach are those in which the object is a sparse set of discrete points, such as inastronomy and confocal microscopy. A recent speciﬁc application was the real-time monitoringof the output polarization states of each core within a multicore ﬁber bundle for applications inmedical endoscopes [8]. An extension of this technique is being implemented within the contextof superresolution microscopy for determining the 3D position, orientation, and vibration of a r X i v : . [ a s t r o - ph . I M ] A ug ntrancepupilBM Circularanalyzer f µ zyxf Incidentf ield E Lens ϕ Fig. 1. System layout for a star test polarimetry measurement of an unknown input ﬁeld E ,illustrated for the case in which a single circular polarization component is imaged. Thepupil plane is described by a radial coordinate u = sin θ and an azimuthal angle φ . independent ﬂuorescent molecules [9]. Star test polarimetry can also be applied for imagingcontinuous objects if these are discretized by placing a pinhole array in the object plane of anafocal 4 f relay system with unit magniﬁcation [10, 11]. In all these applications, the polarizationstate of each object point can then be deduced from the corresponding PSF in the image plane,provided that the PSFs from diﬀerent points do not overlap signiﬁcantly. The spatial resolutionof the measurement is thus limited by the size of the PSF. Encoding polarization information inthe PSF necessarily implies increasing its size, since more information is being included in it.The goal of this work is to ﬁnd the birefringence distribution of the mask in the pupil plane thatmaximizes spatial resolution by producing the smallest possible PSF while imposing reasonablerestrictions to ensure that the object’s polarization is eﬀectively encoded in the shape of the PSF.The optimal solution is found to give very similar results to those of the birefringence distributionof the SEO. The solution’s statistical performance (i.e., the expected error in the retrieved Stokesparameters) is assessed by calculating the Fisher information matrix for the measurement.

2. System layout and notation

Consider the system shown in Fig. 1, in which a spatially-varying, transparent, thin birefringentmask with Jones matrix J ( u ) is placed at the pupil plane of an exit-telecentric imaging system,where u = ( u cos φ, u sin φ ) is the pupil coordinate normalized such that u ≤ NA, with NA beingthe system’s numerical aperture. Collimated light (e.g., from a distant localized source) withuniform polarization E is then incident on this BM in the plane of an aperture with pupil function A ( u ) (binary or apodized) and is focused by a lens, producing a polarization-dependent PSF. Theinput polarization may then be deduced from the shape of the PSF by separating two orthogonalpolarization components of the ﬁeld, denoted by e and e , before forming an image of one(or both) of them. For applications where photons are not scarce, only one of the two imagesis often necessary, so the extraction of the component in question (say, e ) can be performedwith a polarizer. As will be discussed later, however, there are applications in which it is best touse all photons, so instead the two components ( e and e ) are separated by using a polarizingbeamsplitter or a Nomarski or Wollaston prism and then they are imaged separately [9].In what follows, all calculations are performed in the circular polarization basis, so that e = ( , ) and e = ( , ) , corresponding to right-hand circular (RHC) and left-hand circular(LHC) polarizations, respectively. If other output polarization components were analyzed, theresults would still be valid after minor modiﬁcations. It is assumed that the imaging lens is slow(typically around 0.05 NA) so that the PSFs are imaged onto several pixels. All calculations thenassume the paraxial limit, and angle-dependent polarization eﬀects at the lens can be neglected. . Birefringence distribution and point-spread function We use here the notation introduced in Ref. [12] to describe spatially-varying birefringence,where the Jones matrix of the BM in the circular polarization basis may be written as J ( u ) = e i Γ ( u ) (cid:20) q ( u ) + iq ( u ) − q ( u ) + iq ( u ) q ( u ) + iq ( u ) q ( u ) − iq ( u ) (cid:21) , (1)with Γ ( u ) being a global phase function, and q ( u ) = cos δ ( u ) , (2a) q ( u ) = sin δ ( u ) cos Θ ( u ) cos Φ ( u ) , (2b) q ( u ) = sin δ ( u ) cos Θ ( u ) sin Φ ( u ) , (2c) q ( u ) = sin δ ( u ) sin Θ ( u ) , (2d)where Θ ( u ) ∈ [− π / , π / ] and Φ ( u ) ∈ [ , π ) are the latitude and longitude on the Poincarésphere of the “slow” eigenpolarization at the pupil point u , and δ ( u ) represents half the retardancebetween the two eigenpolarizations. As discussed in Ref. [12], the four quantities in Eq. (2)can be used to construct a four-dimensional unit vector (cid:174) q ( u ) = ( q , q , q , q ) , and J ( u ) itselfcan be considered as a unit quaternion. A particularly simple visualization of the birefringenceresults from using the 3-vector q = ( q , q , q ) , which points in the direction joining the twoeigenpolarizations in the Poincaré sphere, and whose magnitude encodes the half-retardance δ as | q ( u )| = sin δ ( u ) [12]. Because (cid:174) q and − (cid:174) q correspond to the same eﬀective birefringence, thisvector can always be chosen such that q = ( − | q | ) / ≥ e and e polarization components of the ﬁeld aregiven by e † J ( u ) E and e † J ( u ) E , respectively, where the dagger denotes a conjugate transpose.The paraxial PSF I ( j ) ( x ) of each polarization component e j is the squared modulus of the Fouriertransform of the pupil distribution: I ( , ) ( x ) = (cid:42) (cid:12)(cid:12)(cid:12)(cid:12)∬ A ( u ) e † , J ( u ) E exp [ ik ( u · x )] d u (cid:12)(cid:12)(cid:12)(cid:12) (cid:43) T , (3)where k = π / λ is the wavenumber, x is the two-dimensional spatial coordinate in the imageplane, and (cid:104)·(cid:105) T denotes a time average in the case of partially polarized light. As shown inRef. [12], the phase function Γ ( u ) causes the PSF to increase in size without helping to encodepolarization information. Therefore, its optimal value is a constant, assumed here without loss ofgenerality to be zero.The PSF may be rewritten succinctly by introducing the pupil functions g ( u ) = A ( u )[ q ( u ) + iq ( u )] , h ( u ) = A ( u )[− q ( u ) + iq ( u )] (4)and their Fourier transforms G ( x ) = ∬ g ( u ) exp [ ik ( u · x )] d u , H ( x ) = ∬ h ( u ) exp [ ik ( u · x )] d u . (5)Note that G ( x ) and H ( x ) represent the output e ﬁeld component that results from an RHC or LHCpolarized incident beam, respectively. As shown in Appendix A, the PSF of each polarizationcomponent may then be expressed in terms of the Stokes parameters S , S , S , S of the incidentﬁeld E in the form I ( , ) ( x ) = (cid:213) n = S n I ( , ) n ( x ) , (6)here the normalized intensity contributions I ( , ) n ( x ) are given by I ( , ) ( x ) = | G (± x )| + | H (± x )| , (7a) I ( , ) ( x ) = ± { G ∗ (± x ) H (± x )} , (7b) I ( , ) ( x ) = ± { G ∗ (± x ) H (± x )} , (7c) I ( , ) ( x ) = ± (cid:0) | G (± x )| − | H (± x )| (cid:1) . (7d)As shown in Appendix B, the Fisher information matrix for a measurement of the normalizedStokes vector s = ( S , S , S )/ S when N photons are detected is given by [N F ( s )] mn = N + µ · s (cid:34) (cid:18) µ m µ n + µ · s (cid:19) − µ m µ n + µ · s (cid:35) , (8)where, in what follows, we assume that only the image for the ﬁrst polarization component ( e ) isbeing used, so that µ ( x ) is deﬁned by µ ( x ) = I ( ) ( x )  I ( ) ( x )I ( ) ( x )I ( ) ( x )  (9)and f = ∬ f ( x ) w ( x ) d x represents a weighted integral of a function f with weight w ( x ) = I ( ) ( x )/ ∬ I ( ) ( x ) d x . Note that µ ( x ) is a unit vector, and therefore | µ | ≤

1. The case in whichboth images are used is discussed in Section 9.

4. Constraints on the BM distribution

As mentioned previously, the primary performance metric for a candidate BM distribution is itsimpact on the width of the resulting PSF. However, it is necessary to impose some constraintsin order to avoid solutions that would be unsuitable for polarimetry. For practical purposes,it is useful for the PSFs corresponding to each polarization component to have the same totalpower Ψ ( j ) = ∬ I ( j ) ( x ) d x . This makes the signal-to-noise ratio roughly independent of the inputpolarization, and it ensures that all polarization information encoded by the BM is fully containedwithin the shape of each intensity distribution rather than the overall power. Consequently,the input polarization can be deduced by analyzing the PSF of a single component, which isguaranteed to contain roughly half of the incident photons when the signal is suﬃciently large.The total power of each polarization component can be found by integrating Eq. (6) over x andapplying Parseval’s theorem to each term, leading to Ψ ( , ) = ∬ (cid:104) S (cid:0) | g | + | h | (cid:1) ± S Re { g ∗ h } ± S Im { g ∗ h } ± S (cid:0) | g | − | h | (cid:1)(cid:105) d u = S ∬ A (cid:0) ± β · s (cid:1) d u , (10)where β ( u ) = (− q q + q q , q q + q q , q − q − q + q ) is a unit vector. Then Ψ ( ) = Ψ ( ) = S ∫ A d u for any arbitrary input polarization if (cid:104) β (cid:105) A ≡ ∬ A ( u ) β ( u ) d u = .Note that this condition is not satisﬁed when (cid:174) q is constant, ruling out the possibility of a uniformBM. In fact, from Parseval’s theorem it is easy to see that µ = (cid:104) β (cid:105) A / Ψ ( ) . Therefore, thecondition (cid:104) β (cid:105) A = implies that µ = , which causes a signiﬁcant simpliﬁcation of the Fisherinformation matrix in Eq. (8), in particular making zero the second term inside the brackets thatis guaranteed to be non-positive for the diagonal elements of the matrix. . Derivation of differential equation and boundary conditions When the BM is introduced into the system, the PSF must increase in size since it now containspolarization information. Because the PSF would provide information about four values (theStokes parameters) rather than one, the width of I ( j ) ( x ) can be expected to be roughly twiceas large as that of a system with zero (or uniform) birefringence. Of course, there are manymeasures for the size of the PSF, each of which would give a slightly diﬀerent result. The metricchosen here is the RMS irradiance width, which allows a simple variational calculation. It isexpected that the optimal birefringence distribution for this measure will be nearly optimal forother measures, such as the full width at half maximum (FWHM) or the Strehl ratio.Generally speaking, the width of the PSF does not strongly depend on the incident polarizationstate since the normalized intensity contributions I ( ) n ( x ) ( n = , , ,

3) have similar compositions.Therefore, for simplicity, the performance may be evaluated in terms of the spread of the averagePSF over all possible input polarizations, i.e., over all values of S , S , and S within therange [− S , S ] . This average produces I ( ) ( x ) = S I ( ) ( x ) , which is the PSF for unpolarizedlight. (Since I ( ) ( x ) = I ( ) (− x ) , the analysis is also valid when the PSFs of both polarizationcomponents are used.) As shown in Ref. [12], the squared RMS width r of the PSF forunpolarized light is r = ∬ | x | I ( ) ( x ) d x ∬ I ( ) ( x ) d x = κ ∬ (cid:16) |∇ A | + A (cid:107)∇ (cid:174) q (cid:107) (cid:17) d u , (11)where ∇ is the gradient with respect to u , (cid:107)∇ (cid:174) q (cid:107) denotes the Frobenius norm of the 2 × ∇ (cid:174) q , and κ = k ∬ A d u . (This result can also be derived from Eqs. (6) and (7a).) The terminvolving |∇ A | accounts for diﬀraction, while the second term is the increase in width due to thebirefringence distribution of the BM. Notice that if A represents a hard aperture, the RMS widthis not well-deﬁned since ∇ A diverges at the edge of the pupil. However, the increment ∆ r = κ ∬ A (cid:107)∇ (cid:174) q (cid:107) d u = κ (cid:213) n = ∬ A ∇ q n · ∇ q n d u (12)caused by the BM can be well-deﬁned and ﬁnite even for a hard-aperture pupil; it is this incrementthat we seek to minimize. Even for hard apertures, this measure of increase of the PSF width isexpected to produce meaningful results, which can be veriﬁed by evaluating the FWHM of thesolution for comparison.Since (cid:174) q is a unit vector, it contains only three free components for optimization. A naturalchoice is to use the three-dimensional vector q and let q = ( − | q | ) / . To ﬁnd the solutionswe use a variational approach: we consider a linear combination of the PSF’s width increaseand the three constraints, M = κ ∆ r + Λ · (cid:104) β (cid:105) A , where Λ = ( Λ , Λ , Λ ) is a vector of Lagrangemultipliers and the constant prefactor κ was introduced for future convenience. The variationalprocedure consists of ﬁnding the conditions under which M is stationary under inﬁnitesimalchanges in q . For this purpose, consider a variation (cid:174) q → (cid:174) q + (cid:174) (cid:15) , where (cid:174) (cid:15) = ( (cid:15) , (cid:15) , (cid:15) , (cid:15) ) is aninﬁnitesimal change. Since after this change the vector must remain a unit vector, (cid:174) (cid:15) must beperpendicular to (cid:174) q , so only three of its components are independent parameters, e.g., the zerothcomponent can be chosen to be a function of the remaining ones according to (cid:15) = − (cid:15) · q q . (13)Let us ﬁrst analyze the variations of each of the diﬀerent parts, starting with the PSF widthncrease. It can be seen from Eq. (12) that (cid:174) q → (cid:174) q + (cid:174) (cid:15) causes the change ∆ r → ∆ r + κ (cid:213) n = ∬ A ∇ q n · ∇ (cid:15) n d u = ∆ r − κ (cid:213) n = ∬ (cid:15) n ∇ · (cid:16) A ∇ q n (cid:17) d u + (cid:20) κ ∫ edge A (cid:174) (cid:15) · ∂ ⊥ (cid:174) q d u (cid:21) , (14)where we ignored terms quadratic in (cid:174) (cid:15) and in the second step we removed the derivatives from (cid:15) n by using integration by parts. The term in square brackets is an edge contribution that is presentin the case of hard apertures, where ∂ ⊥ denotes the derivative normal to the edge, and it shouldbe included only if edges are excluded from the region of integration within the second term(since otherwise this edge contribution is already included in this integral). The correspondingvariations in the three constrained quantities (cid:104) β (cid:105) A give (cid:104) β (cid:105) A → (cid:104) β (cid:105) A + ∬ A (− (cid:15) q + (cid:15) q − (cid:15) q + (cid:15) q ) d u , (15a) (cid:104) β (cid:105) A → (cid:104) β (cid:105) A + ∬ A ( (cid:15) q + (cid:15) q + (cid:15) q + (cid:15) q ) d u , (15b) (cid:104) β (cid:105) A → (cid:104) β (cid:105) A + ∬ A ( (cid:15) q − (cid:15) q − (cid:15) q + (cid:15) q ) d u . (15c)In Eqs. (14) and (15), (cid:15) can be expressed in terms of the remaining (cid:15) n by using Eq. (13).The substitution of the resulting variations into M causes a variation of the form M → M + ∬ ( M (cid:15) + M (cid:15) + M (cid:15) ) d u . The total variation of M is then insensitive to the inﬁnitesimalchanges (cid:15) n at any point inside the pupil only if M = M = M = q ∇· ( A ∇ q ) − q ∇· ( A ∇ q ) + A (cid:2) Λ ( q q + q q ) + Λ ( q − q ) − Λ q q (cid:3) = , (16a) q ∇· ( A ∇ q ) − q ∇· ( A ∇ q ) + A (cid:2) Λ ( q − q ) + Λ ( q q − q q ) − Λ q q (cid:3) = , (16b) q ∇· ( A ∇ q ) − q ∇· ( A ∇ q ) + A (cid:2) Λ ( q q + q q ) + Λ ( q q − q q ) (cid:3) = . (16c)Note that for hard apertures, these equations are dominated at the edges by the derivatives actingon A , which is discontinuous. However, as mentioned earlier, it is perhaps more convenient toaccount for variations of M due to the edges by using the edge term in Eq. (14). After followingsimilar steps, it is easy to ﬁnd three edge constraints that can be written succinctly as q ∂ ⊥ q (cid:12)(cid:12)(cid:12) edge = q ∂ ⊥ q (cid:12)(cid:12)(cid:12) edge . (17)This way, Eqs. (16) apply within the smooth regions of the aperture (even inﬁnitesimally close tothe edges) and constitute the diﬀerential equations for q ( u ) to be solved, while Eq. (17) applies atthe edges, providing appropriate boundary conditions.

6. Solution ignoring the boundary conditions

In order to solve these equations, it is illustrative to ﬁrst solve the simpler problem in which theconstraints are ignored in the variational calculation (that is, the Lagrange multipliers are setto zero), as are the boundary conditions. In this case, Eqs. (16a) through (16c) can be writtencompactly as q ∇ · ( A ∇ q ) = q ∇ · ( A ∇ q ) (18)or as a set of several equalities: ∇ · ( A ∇ q ) q = ∇ · ( A ∇ q ) q = ∇ · ( A ∇ q ) q = ∇ · ( A ∇ q ) q . (19)he latter representation highlights the underlying symmetry between the coordinates of (cid:174) q ( u ) onthe Poincaré hypersphere they inhabit [12].Consider the standard case of a circular hard aperture with pupil function A ( u ) = (cid:40) , u ≤ NA , . (20)The rotational symmetry of the pupil combined with the symmetries of the diﬀerential equationsmentioned earlier allow solving the problem through separation of the polar pupil variables u , φ .The solution becomes particularly simple if we assume that the BM is not birefringent at thepupil’s center, i.e., that q = for u =

0. (Note that this choice causes no loss in generality, sincethe cascading of the resulting BM with plates with uniform birefringence before and/or after itgives access to other solutions.) It is easy to show that a set of solutions to Eq. (18) is given by q ( u , φ ) = b m u | m | + b m u | m | [ cos ( m φ ) , sin ( m φ ) , ] , (21)for integer m , and where b m is a constant. (A derivation of this result in terms of a stereographicprojection of the hyperspherical coordinates (cid:174) q is given in Ref. [13].) Note that we also madethe choice of setting q =

0. Again, there is no loss of generality in this choice, because othersolutions can be found through cascading with uniform birefringent plates.The solutions in Eq. (21) turn out to automatically satisfy the constraints (cid:104) β (cid:105) A = (cid:104) β (cid:105) A = (cid:104) β (cid:105) A = (cid:104) β (cid:105) A π NA = ∫ NA0 b m u | m | − b m u | m | + ( + b m u | m | ) u d u = ∫ ( b m NA | m | v | m | ) − ( b m NA | m | v | m | ) + (cid:2) + ( b m NA | m | v | m | ) (cid:3) v d v , (22)where we used the change of variables v = u / NA ∈ [ , ] in the last step. The value of thisexpression is plotted in Fig. 2 as a function of b m NA | m | for | m | = , , . . . ,

10. Observe thatwhen 1 ≤ | m | ≤

4, there are exactly two values of b m for which (cid:104) β (cid:105) A =

0. For each of theseeight cases, the values of the increment in the RMS width of the PSF can be calculated by using ∆ r = π λ NA (cid:34) π (cid:213) n = ∫ π ∫ (∇ v q n · ∇ v q n ) v d v d φ (cid:35) / = π λ NA (cid:18) ∫ (cid:26) (cid:2) ¯ δ (cid:48) ( v ) (cid:3) + m v sin ¯ δ ( v ) (cid:27) v d v (cid:19) / , (23)where ¯ δ ( v ) = δ ( v NA ) . Note from this expression that solutions with higher | m | tend to cause largerincreases in the size of the PSF. For the solutions in Eq. (21), ¯ δ ( v ) = arctan ( b m NA | m | v | m | ) . Thenumerical results obtained for each solution are shown in the inset in Fig. 2, where we can see thatthe solution with | m | = b = . / NA produces the smallest increase in PSF width, which is ∆ r = . λ / NA. The resulting half-retardance distribution is then ¯ δ ( v ) = ( . v ) .While the solutions found above satisfy the constraints (cid:104) β (cid:105) A = , the fact that these constraintswere ignored at the outset in the variational calculation means that they cannot simultaneouslysatisfy this constraint and the boundary condition in Eq. (17), which for the hard circular apertureconsidered here (where the normal derivative ∂ ⊥ reduces to a radial derivative) is given by¯ δ (cid:48) ( ) = . (24)These solutions are therefore not optimal, but they provide insights that are useful whenconsidering the true optimal solution. j m j b m £ NA j m j ¼ NA ¯ A ¢ r £ NA/λ for each solution:0.239 0.4350.414 0.6220.591 0.7610.786 0.860| m | = m | = m | = m | = Fig. 2. Value of Eq. (22) as a function of b m (times a scaling factor) for several values of | m | .Note that this quantity can vanish only for 1 ≤ | m | ≤

4. The inset shows the correspondingvalues of ∆ r for the eight values of | m | and b m for which the condition (cid:104) β (cid:105) A =

7. Optimal birefringence distribution

Let us now go back to the general equations (16) while again considering a hard circular pupil.Drawing inspiration from the previous case, we propose a separable solution constrained tothe q q plane, namely q ( u , φ ) = sin δ ( u ) [ cos ( m φ ) , sin ( m φ ) , ] with m (cid:44)

0, which automaticallysatisﬁes the constraints (cid:104) β (cid:105) A = (cid:104) β (cid:105) A =

0. Notice from Eqs. (2) that this ansatz corresponds to q ( u ) = cos δ ( u ) , Θ =

0, and Φ = m φ . The substitution of this ansatz into Eq. (16c) implies that Λ = Λ =

0, and the resulting form of both Eqs. (16a) and (16b) gives a diﬀerential equation forthe half-retardance δ , which after changing variables to v = u / NA becomes¯ δ (cid:48)(cid:48) ( v ) + ¯ δ (cid:48) ( v ) v + (cid:18) NA Λ − m v (cid:19) sin (cid:2) δ ( v ) (cid:3) = , (25)for v ∈ [ , ] , where the assumption of no birefringence at the pupil center translates into thecondition ¯ δ ( ) =

0, and the boundary condition in Eq. (24) must be satisﬁed. Additionally,the solution must satisfy the constraint (cid:104) β (cid:105) A =

0, which implies ∫ cos (cid:2) δ ( v ) (cid:3) v d v = Λ . Like for the case considered in the previoussection, the optimal solution corresponds to | m | =

1. This solution is shown in Fig. 3, and it turnsout to be very well approximated by the quadratic expression 1 . v − . v (the R-squared ofthe ﬁt being 0.9996). This solution achieves an increase in the PSF width of ∆ r = . λ / NA,which is indeed slightly smaller than that found in the previous section.

8. Performance evaluation and comparison to SEO

As discussed in the introduction, star test polarimetry is implemented experimentally by usingan SEO having three points of symmetric stress. The birefringence distribution near the centerof an SEO turns out to be approximately given by the same general form as the previous twosolutions, namely q ( u , φ ) = sin δ ( u ) [ cos ( m φ ) , sin ( m φ ) , ] , with m = − δ ( u ) = cu , with c being a measure of the stress in the SEO [6, 7].Given the similarity of this birefringence distribution to the optimal one, the constraints (cid:104) β (cid:105) A = ( v ) Solution ignoringboundary conditionsOptimal solutionSEO

Fig. 3. Radial retardance distribution ¯ δ ( v ) for the optimal BM solution ignoring the boundaryconditions, the true optimum, and an SEO with stress coeﬃcient c = . / NA, plotted asfunctions of the normalized radial pupil coordinate v .Table 1. Performance metrics for the PSFs resulting from using three BM distributionscharacterized by the half-retardance functions in the second column, as well as for adiﬀraction-limited system, when the incident ﬁeld is unpolarized.Solution ¯ δ ( v ) ∆ r FWHM Strehl ratioOptimal ≈ . v − . v . λ / NA 0 . λ / NA 0.490Solution w/o BC’s arctan ( . v ) . λ / NA 0 . λ / NA 0.474SEO 1 . v . λ / NA 0 . λ / NA 0.469Diﬀraction limit 0 Not applicable 0 . λ / NA 1.000 and (cid:104) β (cid:105) A = (cid:104) β (cid:105) A π NA = ∫ NA0 cos ( cu ) u d u = (cid:20) cos ( c NA ) −

12 sinc ( c NA ) (cid:21) sinc ( c NA ) = , (26)where sinc ( x ) = x − sin x . This equation has inﬁnitely many solutions for c , but the one thatleads to the smallest PSF width increase is the ﬁrst one, c = . / NA, shown in Fig. 3.Table 1 provides a comparison of the performance metrics of the BM solutions found in theprevious two sections and the SEO. The half-retardance distributions are also summarized. Allthree solutions correspond to | m | =

1. (The SEO corresponds speciﬁcally to m = −

1, but theresults are the same for m =

1, which corresponds to an SEO followed a uniform half-waveplate.) The optimal solution does outperform the two other solutions not only in RMS widthincrease ∆ r (calculated through Eq. (23)) for which it was optimized, but also for two otherperformance metrics: the FWHM and the Strehl ratio. However, the diﬀerences in these metricsbetween the optimal solution and the SEO are of only between 5 and 10%. Also provided forcomparison are corresponding metrics for a diﬀraction-limited imaging system where no BM isused (or equivalently where the BM is uniform). We can see that the cost of encoding polarizationinformation into the PSF (on average over all possible polarizations) is an increase in its FWHMof about 60% and a reduction of the Strehl ratio by about 50%.The functions G ( x ) and H ( x ) and the corresponding PSF contributions I ( j ) n ( x ) for the optimalBM are shown in Fig. 4. Note that the PSF contributions involve negative contributions for n ≥

1, but the total measured PSF, which corresponds to the superposition in Eq. (6), is alwaysnon-negative as a result of the constraint S ≥ S + S + S . For the PSF contributions, the ( x ) (1) 1 ( x ) (1) 2 ( x ) (1) 3 ( x ) (1)0 ( x ) (2) 1 ( x ) (2) 2 ( x ) (2) 3 ( x ) (2) M a gn it ud e Phase G ( x ) H ( x ) Fig. 4. For the optimal BM solution: complex ﬁelds G and H (left, with amplitudecross-sections through the center shown on the right), and PSF contributions I ( j ) n ( x ) to thePSFs of the e (top row) and e (bottom row) output polarization components. The plots areshown over a square region with half-width 1 . λ / NA. −1.0 −0.5 0.0 0.5 1.0 NA x n ( x , 0) (1) −0.50−0.250.000.250.50 n = 0 (optimal) n = 0 (SEO) n = 1 (optimal) n = 1 (SEO) n = 2 (optimal) n = 2 (SEO) n = 3 (optimal) n = 3 (SEO) Fig. 5. Horizontal slices through y = I ( ) n . The solid anddashed curves correspond to the optimal birefringence distribution and an SEO, respectively. top/bottom row corresponds to the case where the right/left circularly polarized componentemerging from the BM is focused on the detector. Notice that, for these solutions, I ( j ) n ( x ) for n = , , j (the circular component being used) while I ( j ) ( x ) changes by aglobal sign. That is, the symmetries of the functions G ( x ) and H ( x ) are such that the only signchange ( ± ) that has an eﬀect in Eqs. (7) is that in Eq. (7d). The equivalent plots for an SEO arevirtually indistinguishable and are therefore not shown. Instead, to appreciate the very subtlediﬀerences, plots of the horizontal cross-sections I ( ) n ( x , ) for both the optimal BM solution andthe SEO are shown in Fig. 5. In practice, the eﬀect of these diﬀerences is probably insigniﬁcantcompared with variations that would arrive, for example, from the pixelization of the detector.The accuracy of the measurement can be assessed by using the Fisher information matrix inEq. (8), which given the constraints imposed on the solutions reduces to [N F ( s )] mn = N (cid:18) µ m µ n + µ · s (cid:19) . (27)The inverse of this matrix provides an estimate of the variance in the accuracy of the measure- a) (b) (c) (d) e e and e Fig. 6. Two-dimensional cross-sections of the error ellipsoids for incident polarizationstates in the s - s and s - s cross-sections (shown only for s ≥ e outputpolarization component is measured with N = N = ments. That is, for a given incident polarization ˆ s , the retrieved normalized Stokes param-eter vector s is expected (within one standard deviation) to be within an ellipsoid given by ( s − ˆ s ) · [N F ( s )] · ( s − ˆ s ) =

1. Figure 6 shows cross-sections (in red) of these ellipsoids for severalincident polarization states, both for the optimal BM and for the SEO, for the case of N = s - s planefor the optimal BM, but overall the performance is essentially the same. Note that these expecteddeviations of the normalized Stokes vectors scale as N − / , and that the width of the ellipsoids inthe plane normal to the cross-sections is comparable to that in the azimuthal direction.The fact that all solutions described so far have the form q ( u , φ ) = sin δ ( u ) [ cos ( φ ) , ± sin ( φ ) , ] ,where δ ( u ) is linear near the center, can be understood in terms of the two circular components ofthe incident light. Recall that the BM is followed by an RHC polarizer and a Fourier-transforminglens before reaching the detector. Since the birefringence is smallest at the center of the BMand grows away from it, the BM/RHC polarizer combination acts as a rotationally-symmetricapodizer for the incident RHC-polarized light. The incident LHC-polarized component, on theother hand, gets converted into RHC light away from the BM’s center, and acquires a phase vortexdue to a geometric phase eﬀect. After Fourier transformation by the lens, the ﬁeld distributionsat the detector plane for these components are precisely the distributions G and H (shown inFig. 5 for the optimal BM), whose forms are a localized central lobe with uniform phase and aslightly larger “donut” with a phase vortex of unit charge at its center, respectively. The radialextent of the latter would increase with | m | , the magnitude of the vortex charge written by theBM on H , so | m | = s direction following from the variation in the radial direction of the relative weights of the twoeld components is examined in Fig. 7(a) for both the optimal solution and the SEO. While theuniformity is not perfect, all points are represented in similar amounts and the centroid of thedistribution is zero, as guaranteed by the constraint µ =

0. In practice, of course, the samplingof these diﬀerent polarization components is compromised by detector pixelation, and even if thepixels were to be made very small, this would come at the cost of each detecting less photons.However, it is clear that this type of measurement provides a more “democratic” coverage of thesphere when compared, say, to a polarimeter where six polarization components are measured.

9. Results when both polarization components are measured

So far we have assumed that only one circular polarization component ( e ) emerging from the BMis focused to form an image. However, as mentioned in the introduction, it is possible to insteadseparately image both circular components and analyze the two resulting PSFs jointly. Whiledoing so requires a more complex system, there are applications in which it is advantageous [9].One clear advantage is that the number of detected photons is essentially doubled, reducing theuncertainty in the estimation of the Stokes parameters by roughly a factor of 1 /√

2. Note, however,that when both images are used, µ vanishes automatically (where the bar now indicates weightedintegration over both images) because the total number of detected photons is independentof the polarization state. The Fisher information matrix is then given by the simpler form inEq. (27) without having to impose the constraints introduced in Section 4. While removing theseconstraints would clearly aﬀect the variational derivation presented earlier, we now show that theresults found earlier remain nearly optimal for the case when both components are imaged.Consider again the representative case of nearly unpolarized light ( s ≈ ), for which theuncertainty in the estimation of the Stokes parameters is largest, and for which the Fisherinformation matrix in Eq. (27) reduces to N times µ m µ n . Given the forms found for I ( j ) n ( x ) (antisymmetric in x and symmetric in y for n =

1, symmetric in x and antisymmetric in y for n =

2, and rotationally symmetric for n = × µ m µ n isautomatically diagonal. Further, because µ ( x ) is a unit vector, the trace of this matrix is unity.The uncertainty in the measurement is then minimized if the three diagonal elements have equalmagnitude, that is, if µ n = / n = , ,

3. It turns out that the optimal BM solution derivedin Section 7 nearly achieves this, giving µ = µ = .

32 and µ = .

35, while the SEO gives µ = µ = .

35 and µ = .

30. The cross-sections of the uncertainty ellipsoids for both theoptimal BM solution and the SEO are shown (in blue) in Fig. 6 for the case in which 1500photons are detected in each PSF, giving a total number of photons of N = /√ e component was used (red). Also, the uncertainties are nowexactly symmetric in s , as is the coverage of s shown in Fig. 7(b).Finally, note that the relations µ m µ n = µ n δ mn and µ n = , valid for the optimal and SEOsolutions (and whether only e or both e and e components are imaged), imply that the fourbasic PSFs, I ( j ) n ( x ) , are orthogonal (and almost orthonormal since µ n ≈ /

3) under the weight1 /I ( j ) ( x ) . This orthogonality is useful in the retrieval of the parameters from the measured PSFs.

10. Concluding remarks

A variational calculation was performed to determine the spatial birefringence distribution to beplaced at the pupil plane of an imaging system that optimizes the encoding of polarization inthe PSF’s shape while keeping the PSF size as small as possible. This optimal birefringencepattern was found to be similar (both in distribution and performance) to that found naturallynear the equilibrium points of a transparent window under stress: the mask is not birefringent atits center, but its retardance grows with the radial pupil coordinate u , while the orientation of the R e l a ti v e po w e r (a) (b) Optimal BM SEO −1 0 1 R e l a ti v e po w e r Optimal BM SEO

Fig. 7. Histograms of the power coverage of diﬀerent values of s (characterized by µ ) overthe extent of the PSFs, for the cases when (a) only the PSF for the e component is used, and(b) when the PSFs for both components are used. slow and fast axes rotates with the azimuthal pupil coordinate φ as ± φ /

2. (As mentioned earlier,other solutions with equal performance correspond to birefringence distributions that result fromthe cascading of these solutions with uniform wave plates.) The only small diﬀerences were inhow the retardance grows with u away from the center, but these cause only small variations inthe resulting PSFs. In fact, if the variational derivation had been based on a diﬀerent measureof PSF size and/or requirements for polarimetric optimality (e.g. a measure of the uniformityof µ in Fig. 7), the details of the optimal δ ( u ) would change slightly, these variations beingcomparable to those between any of these optimal solutions and the SEO. That is, the mechanicsof stress-induced birefringence provide a simple way to produce a nearly optimal bireﬁngencedistribution for imaging polarimetry, without the need for nano-fabrication or combinations ofspatial light modulators: by applying a pressure distribution with trigonal symmetry to the edgesof a a glass window, a nearly-optimal birefringence mask (the SEO) results.Since the aim of this work was to show that the optimal BM distribution has very similarcharacteristics and performance as the SEO, we considered the continuous functional form of thePSFs. The performance of these devices in real applications, however, will also depend on thepixelation and general characteristics of the detector, as well as on the algorithms used for theretrieval of the Stokes parameters from the measured PSFs. Preliminary work on this direction canbe found in Ref. [13]. These aspects are particularly important in applications such as ﬂuorescencemicroscopy [9], where photons are scarce and the PSFs are used to extract information not onlyabout the two-dimensional polarization studied here, but of the three-dimensional polarization(as well as the three-dimensional position) of the light emitted by a ﬂuorophore. For suchapplications, the near-optimality of the SEO is of central importance. References

1. C. F. LaCasse, R. A. Chipman, and J. S. Tyo, “Band limited data reconstruction in modulated polarimeters,” Opt.Express , 14976–14989 (2011).2. R. Azzam, “Stokes-vector and Mueller-matrix polarimetry,” J. Opt. Soc. Am. A , 1396–1408 (2016).3. E. Collett, Polarized Light: Fundamentals and Applications (Marcel Dekker, New York, 1993).4. J. Chang, N. Zeng, H. He, Y. He, and H. Ma, “Single-shot spatially modulated Stokes polarimeter based on a GRINlens,” Opt. Lett. , 2656–2659 (2014).5. R. Chipman, “Chapter 22: Polarimetry,” in Handbook of Optics, Vol. II,

M. Bass, ed. (McGraw-Hill, New York,1995).6. R. Ramkhalawon, A. M. Beckley, and T. G. Brown, “Star test polarimetry using stress-engineered optical elements,”in

Three-Dimensional and Multidimensional Microscopy: Image Acquisition and Processing XIX, vol. 8227(International Society for Optics and Photonics, 2012), p. 82270Q.7. R. D. Ramkhalawon, T. G. Brown, and M. A. Alonso, “Imaging the polarization of a light ﬁeld,” Opt. Express ,4106–4115 (2013).. S. Sivankutty, E. R. Andresen, G. Bouwmans, T. G. Brown, M. A. Alonso, and H. Rigneault, “Single-shot polarimetryimaging of multicore ﬁber,” Opt. Lett. , 2105–2108 (2016).9. V. Curcio, T. G. Brown, S. Brasselet, and M. A. Alonso, “Birefringent Fourier ﬁltering for single molecule Coordinateand Height super-resolution Imaging with Dithering and Orientation (CHIDO),” arXiv preprint arXiv:1907.05828(2019).10. B. G. Zimmerman, R. Ramkhalawon, M. Alonso, and T. G. Brown, “Pinhole array implementation of star testpolarimetry,” in Three-Dimensional and Multidimensional Microscopy: Image Acquisition and Processing XXI, vol.8949 (International Society for Optics and Photonics, 2014), p. 894912.11. B. G. Zimmerman and T. G. Brown, “Star test image-sampling polarimeter,” Opt. Express , 23154–23161 (2016).12. A. Vella and M. A. Alonso, “Poincaré sphere representation for spatially varying birefringence,” Opt. Lett. ,379–382 (2018).13. A. Vella, “Description and applications of space-variant polarization states and elements,” Ph.D. thesis, University ofRochester (2018).14. J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw, “Review of passive imaging polarimetry for remotesensing applications,” Appl. Opt. , 5453–5469 (2006).15. A. Vella, “Tutorial: Maximum likelihood estimation in the context of an optical measurement,” arXiv preprintarXiv:1806.04503 (2018). AppendixA. Point-spread function

The polarization-dependent PSF from Section 3 can be derived as follows. Consider the pupilfunctions g ± ( u ) = A ( u )[ q ( u ) ± iq ( u )] , h ± ( u ) = A ( u )[] − q ( u ) ± iq ( u )] (28)and their Fourier transforms G ± ( x ) = ∬ g ± ( u ) exp [ ik ( u · x )] d u , H ± ( x ) = ∬ h ± ( u ) exp [ ik ( u · x )] d u . (29)(Note that the “ + ” subscripts correspond to the quantities g , h , G , and H deﬁned in Eqs. (4) and(5).) Then Eqs. (1) and (3) lead to the following expression for the PSF: I ( , ) ( x ) = (cid:68)(cid:12)(cid:12) G ± ( x ) E , ± H ± ( x ) E , (cid:12)(cid:12) (cid:69) T , (30)where E and E are the right- and left-circular polarization components of the input ﬁeld. Thiscan be expanded to obtain I ( , ) ( x ) = (cid:10) | G ± | | E , | + | H ± | | E , | ± { G ∗± H ± E ∗ , E , } (cid:11) T = | G ± | (cid:104)| E , | (cid:105) T + | H ± | (cid:104)| E , | (cid:105) T ± { G ∗± H ± }(cid:104) Re { E ∗ , E , }(cid:105) T ∓ { G ∗± H ± }(cid:104) Im { E ∗ , E , }(cid:105) T = (cid:104) | G ± | ( S ± S ) + | H ± | ( S ∓ S ) ± { G ∗± H ± } S + { G ∗± H ± } S (cid:105) , (31)where in the last step, the electric ﬁeld was rewritten in terms of the Stokes parameters of theincident ﬁeld, given by S = (cid:104)| E | (cid:105) T + (cid:104)| E | (cid:105) T , S = (cid:104) Re { E ∗ E }(cid:105) T , S = (cid:104) Im { E ∗ E }(cid:105) T , S = (cid:104)| E | (cid:105) T − (cid:104)| E | (cid:105) T . (32)Thus, the PSF can be written as I ( , ) ( x ) = (cid:213) n = S n I ( , ) n ( x ) , (33)here the normalized intensity contributions associated with each Stokes parameter are given by I ( , ) ( x ) = | G ± ( x )| + | H ± ( x )| , I ( , ) ( x ) = ± { G ∗± ( x ) H ± ( x )} , I ( , ) ( x ) = { G ∗± ( x ) H ± ( x )} , I ( , ) ( x ) = ± (cid:0) | G ± ( x )| − | H ± ( x )| (cid:1) . (34)These results are simpliﬁed by noting that g − ( u ) = g ∗ + ( u ) and h − ( u ) = h ∗ + ( u ) . From Eq. (29), itfollows that G − ( x ) = G ∗ + (− x ) and H − ( x ) = H ∗ + (− x ) , leading to the form in Eq. (7). B. Fisher information

The Fisher information matrix for a measurement of the normalized Stokes vector s may be derivedusing the formalism shown in Ref. [15]. The probability density function for a measurement ofthe continuous PSF is deﬁned as P ( x | s ) = I ( ) ( x ) ∬ I ( ) ( x ) d x . (35)Using Eq. (6), this expression can be written in terms of the normalized Stokes parameters as P ( x | s ) = I ( ) (cid:32) + (cid:213) n = s n I ( ) n I ( ) (cid:33)∬ I ( ) (cid:32) + (cid:213) n = s n I ( ) n I ( ) (cid:33) d x = w ( x ) + µ ( x ) · s + µ · s , (36)where the abbreviations in Eq. (9) and the paragraph that follows it were used. The elements ofthe Fisher information matrix may be derived from the log-likelihood function (cid:96) ( s | x ) = ln (cid:20) w ( x ) + µ ( x ) · s + µ · s (cid:21) . (37)The ﬁrst two derivatives of this function with respect to the normalized Stokes parameters are ∂(cid:96)∂ s n = µ n + µ · s − µ n + µ · s , ∂ (cid:96)∂ s m ∂ s n = − µ m µ n ( + µ · s ) + µ m µ n ( + µ · s ) . (38)Then the ( m , n ) th element of the unit Fisher information matrix for a single-photon measurementis deﬁned as [ F ( s )] mn = − ∬ (cid:18) ∂ ∂ s m ∂ s n (cid:96) ( s | x ) (cid:19) P ( x | s ) d x = − ∬ w ( x ) + µ · s + µ · s (cid:20) − µ m µ n ( + µ · s ) + µ m µ n ( + µ · s ) (cid:21) d x = (cid:18) + µ · s ∬ µ m µ n + µ · s w ( x ) d x (cid:19) − (cid:18) µ m µ n ( + µ · s ) ∬ ( + µ · s ) w ( x ) d x (cid:19) . (39)The integral in the second term evaluates to 1 + µ · s , which cancels with one factor in thedenominator. This simpliﬁcation leads to Eq. (8). Funding

National Science Foundation (NSF) (PHY-1507278); Excellence Initiative of Aix MarseilleUniversity - A ∗ MIDEX, a French “Investissements d’Avenir” programme.