[PDF] Quantitative photoacoustic oximetry imaging by multiple illumination learned spectral decoloring

Abstract

Significance: Quantitative measurement of blood oxygen saturation (sO_2) with photoacoustic (PA) imaging is one of the most sought after goals of quantitative PA imaging research due to its wide range of biomedical applications. Aim: A method for accurate and applicable real-time quantification of local sO_2 with PA imaging. Approach: We combine multiple illumination (MI) sensing with learned spectral decoloring (LSD); training on Monte Carlo simulations of spectrally colored absorbed energy spectra, in order to apply the trained models to real PA measurements. We validate our combined MI-LSD method on a highly reliable, reproducible and easily scalable phantom model, based on copper and nickel sulfate solutions. Results: With this sulfate model we see a consistently high estimation accuracy using MI-LSD, with median absolute estimation errors of 2.5 to 4.5 percentage points. We further find fewer outliers in MI-LSD estimates compared to LSD. Random forest regressors outperform previously reported neural network approaches. Conclusions: Random forest based MI-LSD is a promising method for accurate quantitative PA oximetry imaging.

Full PDF

QQuantitative photoacoustic oximetry imaging by multipleillumination learned spectral decoloring

Thomas Kirchner a , Martin Frenz a,* a Biomedical Photonics, Institute of Applied Physics, University of Bern, Bern, Switzerland

Abstract.Signiﬁcance:

Quantitative measurement of blood oxygen saturation (sO ) with photoacoustic (PA) imaging is one ofthe most sought after goals of quantitative PA imaging research due to its wide range of biomedical applications. Aim:

A method for accurate and applicable real-time quantiﬁcation of local sO with PA imaging. Approach:

We combine multiple illumination (MI) sensing with learned spectral decoloring (LSD); training on MonteCarlo simulations of spectrally colored absorbed energy spectra, in order to apply the trained models to real PAmeasurements. We validate our combined MI-LSD method on a highly reliable, reproducible and easily scalablephantom model, based on copper and nickel sulfate solutions.

Results:

With this sulfate model we see a consistently high estimation accuracy using MI-LSD, with median absoluteestimation errors of . to . percentage points. We further ﬁnd fewer outliers in MI-LSD estimates compared toLSD. Random forest regressors outperform previously reported neural network approaches. Conclusions:

Random forest based MI-LSD is a promising method for accurate quantitative PA oximetry imaging.

Keywords: qPAI, MIS, photoacoustics, machine learning, blood oxygen saturation, spectral coloring. * [email protected] A robust and accurate quantitative measurement of blood oxygen saturation (sO ) with photoa-coustic (PA) imaging, also called optoacoustic imaging, is one of the most sought after goals ofquantitative PA imaging (qPAI) research due to its wide range of immediate applications. UsuallyqPAI research aims to achieve an absolute quantiﬁcation of optical properties, like the absorptioncoefﬁcient µ a , from measured PA signals S ( d , t ) recorded at times t at detector position d .

1, 2

Inbrief, such a quantiﬁcation of µ a encompasses a solution of two ill posed inverse problems. (1) Theacoustic inverse problem from S ( d , t ) to an initial pressure spatial distribution p ( x ) . And (2) theoptical inverse problem from H ( x ) = p ( x ) / Γ( x ) = φ ( x , µ a ( x ) , µ (cid:48) s ( x )) · µ a ( x ) to µ a ( x ) , ata location x , with the Gr¨uneisen parameter Γ and the reduced scattering coefﬁcient µ (cid:48) s . There theﬂuence φ is dependent on unknowns like the absorption and scattering in the tissue surrounding x . qPAI methods either depend on model-based inversion or data-driven approaches. Theseapproaches often perform well in silico but struggle with the translation to real measurements ineither phantoms or in vivo .In PA imaging, sO estimations are derived from multispectral PA measurements by ﬁrst per-forming an acoustic reconstruction yielding images of the PA signal S ( x , λ ) = Γ( x ) · A ( x ) · φ ( x , µ a ( x , λ ) , µ (cid:48) s ( x , λ )) · µ a ( x ) (1)for each measured wavelength λ , with A ( x ) being an unknown spatially varying factor intro-duced by the imperfectly solved acoustic ill-posed inverse problem (i.e. image reconstruction fromdata with limited frequency bandwidth and a limited probe aperture). Using a linear image recon-struction, the acoustic inverse problem can be assumed as wavelength independent. The spectral1 a r X i v : . [ phy s i c s . m e d - ph ] F e b oloring due to the wavelength dependent ﬂuence variation causes the dominant distortion in anysO estimation made from multispectral signal stacks S ( x , λ ) . This spectral coloring of PA signalsneeds to be corrected for in order to yield accurate quantitative estimates of sO . To address thisneed, we combine two approaches to quantitative PA imaging of sO . (1) Multiple illumination(MI) sensing – a method in which a sequence of PA measurements is acquired with a sequenceof illuminations at different positions. Usually, effective attenuation of the illumination is then es-timated with diffusion theory and then used for correcting spectral coloring. (2) Learned spectraldecoloring (LSD) – a data science method in which a machine learning algorithm is trained onMonte Carlo simulations of spectrally colored multispectral PA measurements in order to decolorreal measurements.Both these methods can yield promising results on their own but still suffer from a range ofconstraints. I.e. MI sensing implementations typically assume and use point illuminations whichenables the use of closed-form solutions of the diffusion approximation of light propagation, but limit SNR due to the laser safety limit for skin. The resulting long acquisition times makethis method difﬁcult to translate to realistic macroscopic applications. Furthermore, MI sensingso far has theoretical limits in highly inhomogeneous scenes due to its reliance on the diffusionapproximation. MI sensing implementations usually aim to estimate absolute values of µ a , whichgoes beyond what is needed for an estimation of sO . LSD

15, 19 and similar spectral approaches currently yield accurate in silico estimations and plausible initial results in highly constrainedsettings but they have insufﬁcient input to robustly generalize these results over diverse geometriesand applications. Both MI sensing and LSD are not yet thoroughly validated partially due to a lackof stable and reliable sO phantoms.Even though substantial progress has been made in dynamic blood ﬂow phantoms for photoa-coustic imaging validation, these blood or red blood cell suspension phantoms require extensiveﬁne tuning and even then yield reference values with limited accuracy. At best a reference mea-surement of − % is achievable with state of the art blood ﬂow phantoms.

22, 23

Rather than implement such a sO ﬂow phantom we used copper and nickel sulfate solutions ina relative copper sulfate model similar to work by Buchmann et al. to mimic absorption spectraof differently oxygenated blood. This allowed a reliable sub % error in our ground truth andallowed us to rapidly manufacture stable and highly reproducible phantoms with wide variationsin optical properties in order to generate high quality test sets for spectral decoloring methods. We investigated a method combining learned spectral decoloring (LSD) and multiple illumination(MI) measurements. To that effect we1. Developed a system to perform real-time multiple illumination multispectral PA imaging.2. Implemented modiﬁed LSD machine learning algorithms using MI.3. Used these algorithms to train on in silico data from Monte Carlo optical forward simulationswith a relative copper sulfate model.4. Validated and tested machine learning models trained only on in silico data on comprehen-sive phantom measurements using that copper and nickel sulfate based sO model.2 .1 Multiple illumination photoacoustic imaging Fig 1

Multiple illumination (MI) photoacoustic(PA) imaging setup. Illumination via fast tunableoptical parametric oscillator (OPO) laser sequen-tially illuminating ﬁber bundles using a galvo mir-ror system driven by an arbitrary waveform genera-tor (AWG). PA signals were measured with a lineararray ultrasound (US) probe and recorded by a 64channel US data acquisition (DAQ) system. An USgel pad is used to allow for in plane illumination.

Our MI PA imaging setup is illustrated in ﬁgure 1. It uses a fast wavelength-tunable opticalparametric oscillator (OPO) laser system (prototype SpitLight, InnoLas Laser GmbH, Krailling,Germany) with ﬁve nanosecond pulse duration and 100 Hz pulse repetition frequency. The laserpulses were sequentially coupled into four high power ﬁber bundles (FiberOptic P.+P. AG, Spreit-enbach, Switzerland) with NA 0.22 ﬁbers, each bundle with a 2 mm diameter. This was achievedusing a galvo mirror system (GVS011/M, Thorlabs Inc., Newton, USA) driven by an arbitrarywaveform generator (TG5011, Aim-TTi, Cambridgeshire, United Kingdom), which was synchro-nized with the laser system. The ﬁber bundle output sides were arranged in a line array with 8 mmspacing. The illumination pulses were attenuated to have a maximum energy of 10 mJ per pulse atthe ﬁber output. To comply with ANSI Safety limits

17, 25 the beams are widened to 7 mm full widthat half maximum (FWHM) at the tissue or phantom surface. Illumination and acoustic detectionensues through 18 mm thick Ultrasound gel pad (Parker Laboratories Inc., Fairﬁeld, USA). Wemeasure the 64 center channels of a 128 element linear array transducer (L7-4, Advanced Technol-ogy Laboratories Inc., Bothell, USA) with a center frequency of 5 MHz, a pitch of 0.3 mm and afractional bandwidth of 80 %. The number of acquisition channels was limited by our 64 channelUS data acquisition system (V-1-64, Verasonics, Inc., Kirkland, USA). For this study, we used thefull tuning range of our OPO and acquired PA measurements for 16 equidistant wavelengths from680 nm to 980 nm in 20 nm steps, each for four illumination positions. After ﬁring one pulse of onewavelength in each ﬁber bundle the wavelength is tuned to the next in sequence. Using this 4 ×

16 sequence, each MI and multispectral stack of PA images takes 640 ms to acquire. We generallyrecorded the raw data for 30 such stacks for each scan. Live beamforming and visualisation with25 fps was performed using custom matlab scripts but this live visualisation was solely used forprobe positioning and quality control (e.g. avoiding air inclusions under the gel pad).

The acoustic reconstruction of PA images for further analysis was performed using the PA imageprocessing module from the Medical Imaging Interaction Toolkit (MITK). The raw data wasbeamformed using a delay and sum (DAS) algorithm, with a ﬁxed speed of sound of 1480 ms − and a Hann apodization over an angle of ± degrees. For noise reduction, the beamformeddata was bandpassed. A B-Mode image was formed by using an envelope detection ﬁlter and3ownsampling the result to a 0.15 mm isometric resolution. The full image processing pipelineincluding all relevant parameters is part of the open source appendix. The B-mode images werecorrected for the mean laser pulse energy at a speciﬁc wavelength. This mean laser pulse energycorrection was determined directly at the ﬁber bundles output before the experiments – averagingthe pulse energy for 30 laser pulses of each wavelength. For a single wavelength the variation ofpulse energy was less than 3 %, to reduce this noise component’s inﬂuence, we also averaged ourPA measurements over 30 full stacks of measurements. The phantoms used, consisted of arrays of polythene tubing (Smiths Medical International Ltd.,Kent, UK) with 0.58 mm inner diameter and 0.96 mm outer diameter. These tubes were ﬁlled witha relative copper sulfate model solution (as detailed in section 2.3.1) and arranged as shown insection 2.3.3. The relative copper (rCu) in this model is mimicking blood oxygenation (sO ).For all the phantom experiments the background scattering medium was a fat emulsion (SMOF-lipid 20 %, Fresenius Kabi, Switzerland) diluted to 1.5 % fat content. To avoid errors intro-duced by inter-batch variations in the scattering properties of stock fat emulsions like intralipidor SMOFlipid, the optical properties of the used stock emulsion was assessed with a Time Corre-lated Single Photon Counting (TCSPC) technique as detailed in section 2.3.2. The relative copper sulfate model solution was based on a 2.2 molar nickel sulfate (NiSO ) watersolution, produced using nickel(II) sulfate hexahydrate ( >

98 %, Sigma-Aldrich) and on a 0.25 mo-lar copper sulfate (CuSO ) water solution, produced using copper(II) sulfate pentahydrate ( >

98 %,Sigma-Aldrich). As illustrated in ﬁgure 2, these chromophores are mimicking the NIR absorp-tion spectra of oxy- and deoxyhemoglobin in average whole blood with a hemoglobin concentra-tion c wb (HbT) = 150 gl − . Copper and nickel sulfate were also chosen for their temporal stabilityand resistance to bleaching.

Fig 2

Absorption coefﬁcient µ a spectra. Left: Oxy- and deoxyhemoglobin at whole blood concentrations c wb (HbT)= 150 gl − . Right: Copper and nickel sulfate in aqueous solution in whole blood equivalent solutions using a relativecopper sulfate (rCu) model. The reference measurements of the ﬁve rCu mixtures used in our phantoms are plotted as“+”. Sulfate spectra were measured with a spectrophotometer. The spectra of the sulfate solutions absorption coefﬁcients µ a in whole blood mimicking con-centrations are deﬁned as c wb (NiSO ) := c wb (CuSO ) := The initial reference measurements were donein 2 nm steps, with a 10 s integration time and using a photomultiplier tube (PMT) sensor. Theabsorption spectroscopy measurements were repeated on the solutions after 70 days to verify theirstability over time. Whenever new batches of the sulfate solutions were produced, their absorptionspectra were checked against the spectra of the ﬁrst batch. The solutions were corrected when theydeviated from the reference spectra by more than 1 %.The relative copper (rCu) in this model aims to mimic blood oxygenation (sO ) and is thereforesimilarly deﬁned as rCu = c r (CuSO ) c r (CuSO ) + c r (NiSO ) , (2)with the respective concentrations of the sulfate solutions relative to their blood mimicking basesolutions c r (CuSO ) = c (CuSO ) c wb (CuSO ) and c r (NiSO ) = c (NiSO ) c wb (NiSO ) . (3)For comparison, the deﬁnition of blood oxygen saturation is sO = c (HbO ) c (HbO ) + c (Hb) . (4)While of course not following hemoglobin spectra exactly, this sulfate model is a good qualitativeﬁt to hemoglobin and is much easier to accurately control and reproduce than the saturation ofoxygen in hemoglobin. It is highly stable over time; i.e. over 70 days only changes smaller than1 % in absorption were observed. Mimicking the blood volume fraction (bvf) in tissue we deﬁnea sulfate volume fraction (svf) in our model as svf = c r (CuSO ) + c r (NiSO ) . The svf within theblood vessel mimicking tubing was always 100 %, mimicking whole blood, whereas the svf in thebackground was varied as detailed in section 2.3.3. In the background medium, scattering comparable to tissue (i.e. µ (cid:48) s = 15 cm − at 750 nm) was ob-tained by using a 1.5 % fat emulsion (diluted from SMOFlipid 20 %, Fresenius Kabi, Switzerland).To ensure a reproducible and tissue mimicking scattering, the background medium was anal-ysed with Time Correlated Single Photon Counting (TCSPC) spectroscopy. The TCSPC instru-ment used for the spectral analysis of the emulsions optical properties consisted of a white lightsupercontinuum laser (SuperK Extreme, NKT Photonics, Birkerød, Denmark) with ≈

100 ps pulseduration (varying with wavelength), running at 39 MHz with less than 4 mW laser output. Thiswhite light was ﬁltered by a tunable ﬁlter (SuperK Varia, NKT Photonics, Birkerød, Denmark),which was tuned in a range from 600 nm to 840 nm in 20 nm steps, with a bandwidth of 10 nm;840 nm being the maximum of the tunable ﬁlter’s range. A single-photon avalanche diode (MDPPDM Series, Micro Photon Devices, Bolzano, Italy) was used to detect single photons. The diodehas a prolonged dead time of ≈

80 ns after a photon detection. Because of that, the photon detectionrate was kept sufﬁciently low to make photon detection events during the dead time unlikely. Weensured a detection rate lower than s − ( (cid:28) / ns), making a correction for missed photons5uring the dead time unnecessary. The distributions of times of ﬂight (DTOFs) were recorded withsingle photon counting electronics (SPC-160, Becker & Hickl GmbH, Berlin, Germany). Sourceand detector ﬁber were ﬁxed in blunted hypodermic needles for stability. The laser pulse shape,temporal dispersion in the optical ﬁbers, as well as the response of the detector were characterizedin the overall instrument response function (irf), yielding a FWHM of ≈

140 ps overall, varyingwith wavelength. The source and detection ﬁbers were placed perpendicular to the surface of thesample medium and immersed in the medium by 0.5 mm. To reduce the detection of early arriv-ing photons a carbon ﬁber mesh blocker was placed into the direct path, at a distance of 6 mmfrom the source ﬁber (dimension: 1 mm depth, 4 mm width, 0.4 mm thickness). We measuredthe SMOFlipid 1.5 % medium in an 8 cm radius, 10 cm deep beaker, with the ﬁbers at the center.This is a sufﬁciently large volume to be approximated as a semi-inﬁnite medium for the analyticdiffusion model. The resulting media were both measured with a source detector separation ρ =20 mm, for each wavelength until at least photons were detected. For some wavelengths, thelaser needed to be attenuated in order to keep the photon detection rate below s − . This ac-quisition protocol ensured a high signal-to-noise ratio and allowed us to ﬁt our diffusion modelonly to late arriving photons where the diffusion approximation is more accurate. For the phan-tom experiments two bottles of a new batch of SMOFlipid were used – both batches and bottleswere measured independently prior to experiments to avoid hidden variations in the backgroundmedium.An analytic diffusion model with an extrapolated boundary condition for a semi-inﬁnitemedium

30, 31 was convolved with the corresponding irf for each wavelength λ . The results werethen ﬁtted to the measured histograms of the single photon arrival times, yielding a series of tuples ( µ (cid:48) sSPC ( λ ) , µ aSPC ( λ )) . Our tunable ﬁlter was limited in range to a maximum wavelength 840 nmbut we needed credible µ (cid:48) s values up to 980 nm for the optical forward simulations. Therefore,a generic tissue model (eq. 5) from the mcxyz framework was used to expand and deﬁne thescattering properties within the optical forward simulation. µ (cid:48) s ( λ ) = µ (cid:48) s · ( f ray · ( λ/ nm ) − + (1 − f ray ) · ( λ/ nm ) − b mie ) (5)with µ (cid:48) s = 42 . cm − the initial guess for µ (cid:48) s at 500 nm, f ray = 0.62 the initial guess forfraction of Rayleigh scattering at 500 nm, b mie = 1.0 the initial guess for the scatter power for Miescattering. This was ﬁtted to the TCSPC data with a least squares ﬁt – the entire data processingpipeline with all parameters is part of the open source code supplement. The resulting ﬁts areshown in ﬁgure 3. Three sets of phantoms (A,B,C) were produced, with different layout as shown in ﬁgure 4. Allphantoms use polythene tubing ﬁlled with the relative copper sulfate model solution as targetstructures. The phantom backgrounds consist of a 1.5 % fat emulsion with added sulfates.Phantom layout A was measured as a validation data set for hyperparameter tuning of themachine learning models and validation of image reconstruction as well as parameter tuning in theMonte Carlo simulations. Layouts B and C were exclusively measured as test data sets. Phantomtest set B are expected to be within the distribution of the simulation parameters (cf. ﬁgure 5).Phantom test set C however consists only of longitudinal scans w.r.t. the tube orientation. Becausethe orientation of the illumination positions changes with the imaging plane, set C was illuminated6 ig 3

Optical properties of the uncolored phantom background medium. Single data points are diffusion model resultsfrom measurements with the TCSPC spectroscopy instrument of a 1.5 % fat emulsion (diluted from SMOFlipid 20 %).Validation Phantoms cf. ﬁgure 4a were constructed with the “2nd batch, 1st bottle”. A generic tissue scattering model(eq. 5) ﬁtted to this ﬁrst bottle measurement was used to set the background scattering properties for the Monte Carlosimulations. The “2nd bottle” was used for the background media in the test phantoms. The absorption results areshown together with water absorption. along the tubing. The measurements in set C are therefore expected to be out-of-distribution (OOD)with respect to the Monte Carlo simulated training sets. As detailed in the next section, simulationswere exclusively performed for transversal orientation of the tubing.The phantom data sets contain 164 multispectral MI PA scans from 115 scan conﬁgurations asfollows: A

30 scan conﬁgurations as laid out in ﬁgure 4A: Six phantom conﬁgurations: one with only a1.5 % SMOFlipid background solution, ﬁve with an added 1 % sulfate volume fraction (svf)background with relative copper rCu bg set to {

0, 25, 50, 75, 100 } %. On each of these sixconﬁgurations ﬁve MI multispectral scans were performed centering the transducer on eachof the tubes with rCu tube = {

0, 25, 50, 75, 100 } %. B

55 scan conﬁgurations as laid out in ﬁgure 4B: Eleven phantom conﬁgurations: one with onlya 1.5 % SMOFlipid background solution, ﬁve with an added 1 % svf background with rCu bg = {

0, 25, 50, 75, 100 } % and ﬁve with a 0.5 % svf. On these eleven phantom conﬁgurationsMI multispectral scans were performed centering on each of the ﬁve four-tube-arrays withrCu tube = {

0, 25, 50, 75, 100 } %. For each four-tube-array, two regions of interest (ROI)(one containing the two lower and one the two upper tubes) were analysed separately. Theimaging plane was positioned for transversal scans of the tubes. C

30 scan conﬁgurations as laid out in ﬁgure 4C: Three phantom conﬁgurations: one withonly a 1.5 % SMOFlipid background solution, two with an added 1 % svf background withrCu bg = {

0, 100 } %. On these three phantom conﬁgurations MI multispectral scans wereperformed with each of the ﬁve shallowest tubes, and each of the ﬁve deepest tubes of thefour-tube-arrays in the imaging plane, with rCu tube = {

0, 25, 50, 75, 100 } %. The imagingplane was positioned for longitudinal scans of the tubes.All scan conﬁgurations were scanned for 19.2 s yielding 30 MI and multispectral sequences.Due to the limited ﬁeld of view of our US system (parallel read-out of 64 channels on a 19.2 mm7 ig 4 Cross sections of the phantom data sets, with de-noted parameters: relative copper sulfate model in thetubes rCu tube , in the background medium rCu bg and sul-fate model volume fraction in the background mediumsvf bg . A The validation phantoms with ﬁve single tubes. B The main test phantoms. C The test phantoms in lon-gitudinal scan direction and thereby somewhat out-of-distribution (OOD) of the training data. The shown 2Dcross sections correspond to the imaging plane. In set Aand B the tubes run perpendicular to the imaging plane.Phantom test C has the same geometry as set B, with theimaging plane rotated by 90 degrees to yield longitudinalscans instead of transversal scans of the tubes. linear array) we repositioned the probe between acquisitions – i.e. measuring ﬁve scan positionsfor phantom geometries A and B. The center of the linear transducer was always placed abovethe center of the targeted tubes. Scans with technical difﬁculties such as frame drops or wrongpositioning were discarded in post processing, this affected one of the 115 scan conﬁgurations: therCu tube = 100 %, rCu bg = 50 %, svf = 0.5 % was discarded for erroneous positioning. All scansof Phantom geometry C were performed twice. The svf = 0 scans on phantom geometry B wereperformed ﬁve times on different days as a baseline measurement. The total phantom data setconsists of 164 scans.It is important to note that both copper and nickel sulfate act as a demulsiﬁer when mixedwith the diluted SMOFlipid background, or any other fat in water emulsion. Phases will form andthe bulk optical properties will change signiﬁcantly within tens of seconds. To avoid the formingof phases, the background medium with added sulfates was continuously stirred with a magneticstirrer during all the measurements. As an optical forward model we used GPU accelerated Monte Carlo simulations to generate groundtruth multispectral stacks of the absorbed energy distributions H ( x , λ ) . Figure 5 illustrates thelayout of the Monte Carlo simulated volumes. The simulated data set consists of a 4000 volumetraining set and a separate 1000 volume test set. For each volume 16 wavelength and four positionsof illumination were simulated, modeled on the real MI PA imaging sequences. The simulationswere performed with the open source mcx toolkit and we used the ippai framework for the illu-mination modelling and data organisation. In all data sets, each volume has two sets of tubes withthe tube count drawn from a discrete uniform distribution U { , } , uniformly distributed in thevolume as speciﬁed in ﬁgure 5. Tube and background rCu are drawn from a continuous uniformrandom distribution U (0 , . All tubes were set to a radius of 0.4 mm. The wavelength dependentbackground scattering parameters were set to the tissue model results from the ﬁt of the TCSPCmeasurements to equation 5. The background svf was drawn from U (0 , . Each simulation wasperformed with launched photon packets. Running these simulations on a high performancecomputing cluster we used mostly 1080 GTX (Nvidia, Santa Clara, USA) GPUs, with which asingle wavelength, single illumination position, simulation took approximately two minutes. All8 ig 5 The in silico training data set consists of 4000volumes, simulated with Monte Carlo simulations,modeling the geometry of the real multiple illumina-tion (MI) setup. An additional 1000 volume test setis kept separate. Each volume has two sets of tubeseach with a random number of tubes, uniformly dis-tributed as speciﬁed. Tube and background relativecopper (rCu) as well as background sulfate volumefraction (svf) are also drawn from uniform randomdistributions U . simulations for the test and training sets used a combined 400 days of GPU time. This was madefeasible by usually running 40 GPUs in parallel. It should be noted that this seemingly excessivesimulation time was chosen after simulation results with photon packets proved too noisy. Thiswas evaluated prior to the presented in silico data sets. Initial hyperparameter tuning was also per-formed on two in silico data sets, simulated with photon packets. These data sets are part ofthe supplemental data. The estimation of a sO or rCu value from a measured spectrum is a regression problem. The usualapproach to this problem in PA imaging is linear spectral unmixing (LU).

26, 35

For one pixel, the PAsignal spectrum S ( λ ) is measured at a set of wavelengths λ . This sampled PA signal spectrum S isthen ﬁtted to a linear combination of known absorption spectra. Here, LU is performed numericallyusing an iterative least squares solver implemented in Python’s scipy.optimize submodule. TheseLU estimations (rCu LUest ) are given throughout the results section as a reference.We also compare our results to LSD, a type of machine learning algorithm. LSD also aimsto estimate sO or rCu from the the same single illumination PA signal spectra S measured atwavelengths λ . Similar to prior implementations, our modiﬁed LSD models are machine learningalgorithms that are trained on large amounts of simulated absorbed energy spectra labeled withground truth rCu. Before training, each absorbed energy spectrum is normalized with its L1 normto ˆ H ( λ ) . This normalization makes them equivalent to a normalized PA signal spectrum ˆ S ( λ ) .This is because we can assume that for a signal spectrum S at a position x S ( x , λ ) = Γ( x ) · A ( x ) · H ( x , λ ) (6) ⇒ ˆ S ( x , λ ) ≈ ˆ H ( x , λ ) . (7)Assuming a linear acoustic reconstruction like delay and sum, A ( x ) is a spatially varying butwavelength independent factor introduced by the imperfect acoustic reconstruction, the instrumentresponse and the calibration. Γ( x ) , as a material property is also independent of the illuminationwavelength. The LSD model, which was trained on the in silico training set tuples ( ˆ H , rCu tube ) is then presented (1) unseen in silico test set spectra ˆ H or (2) spectra ˆ S from an unseen phantomdata test set to estimate the corresponding rCu LSDest .Note that A actually does depend on the ﬂuence distribution φ ( x , µ a ( x , λ ) , µ (cid:48) s ( x , λ )) . A vary-ing optical wavelength may lead to different acoustic spectra of the PA signal corresponding to the9ame structure, due to different spatial distributions in the absorbed energy. Our assumption is thatthis effect is small compared to the spectral coloring introduced directly by the ﬂuence term in H ( x , λ ) = φ (cid:0) x , µ a ( x , λ ) , µ (cid:48) s ( x , λ ) (cid:1) · µ a ( x , λ ) . (8) Fig 6

Examples for spectral coloring in L1 normalized spectra of an absorber with rCu tube = 100 %. Comparison be-tween (left) in silico absorbed energy spectra ˆ H and (right) phantom PA signal spectra ˆ S . Spectra for two backgroundmedia are shown: rCu bg = 100 % (yellow), rCu bg = 0 % (dark); with svf = 1 %. The spectra for two illuminationpositions I (line) and I (dotted) are shown as an example for two of the four illuminations. Systematic changesin the spectral coloring can be seen for different background media and illumination positions. These changes arequalitatively similar for ˆ H and ˆ S . On the validation phantom examples LU estimations for single spectra rCu LUest arelisted – spectral coloring can cause large estimation errors relative to the rCu tube = 100 % ground truth.

For MI-LSD we have multiple such normalized spectra ˆ S as input variables. For illustration,ﬁgure 6 shows spectra of the same pixel in an absorber with rCu tube = 100 % with two exampleilluminations I , I and for two backgrounds with rCu bg = { } and svf = 1 %. Thedifference in background absorption causes a different spectral coloring but so does a variation ofthe illumination position. We hypothesise that training our machine learning algorithms on i.e.four such spectra will allow us a more accurate and/or more robust estimation compared to LU andLSD.Two types of machine learning algorithms were employed for spectral decoloring: feed forwardneural networks (NN) and random forests (RF). Training of the MI-LSD models includes mirroredillumination positions for each volume as a minor data augmentation. Sorting the training dataillumination position spectra stacks by their L1 norm before training was also investigated but didnot prove beneﬁcial on the validation data. Feed forward neural networks were previously used for LSD implementations.

15, 19

We used thisstate-of-the-art NN architecture as a starting point and further tuned the hyperparameters on thetraining and validation sets. Doing so we mainly found the dropout layers of previous implemen-tations to be counterproductive – dropout leading to a much lower precision on the validation set.The two ﬁnal NNs used for both LSD and MI-LSD consisted of four hidden layers with twicethe size of the input layer (16 for LSD and 64 for MI-LSD), all with leaky ReLu activation layers(and for comparison, dropout layers). For comparison to the previous implementation, additional10esults for a dropout in the dropout layers with probability p = 0 . are presented in the supple-mental ﬁgures 35-50 and 66-72. In the main results no dropout was used ( p = 0 ). We segmentedall vessels in the 4000 volume training set and trained on the segmented 1,052,152 simulated MIsignal spectra for 100 epochs. As in the previous implementation we used a batch size of anda learning rate of − · . epoch / . We trained the algorithms on a RTX 2060 Super GPU (Nvidia,Santa Clara, USA) and used the CPU for inference. We also investigated random forest regression, usually a highly accurate learning algorithm forregression problems with few dimensions. RFs are also usually less impacted by noise mod-els, they especially should not overﬁt on noise. This should prove useful as we did not try tomodel a realistic wavelength dependent noise. We used the python scikit-learn v0.23 implemen-tation of random forest regressors using 100 trees with a maximum depth of 30 to limit memoryconsumption. Further parameters were set to default.

We ﬁrst show some qualitative comparisons between in silico and phantom data and then presentthe performance of our trained RF and NN models on our in silico test set and the two phantomtest sets.The hyperparameters of the machine learning models were tuned on the phantom validation set.The models that performed best on our validation data were used to estimate rCu from the test sets.These models are presented in the results. For further information, all estimations for all models(on the validation set and for each single test measurement) can be found in the supplementalﬁgures; representative examples are shown here.We compare MI-LSD with LSD and LU. Comparing methods based on a single measurementwith methods based on multiple measurements should generally lead to improved accuracy in themultiple measurement method simply due to an increase in signal-to-noise ratio (SNR). In orderto more fairly compare MI-LSD to the single spectrum methods like LU and LSD, we estimatedLSD and LU results on the reconstructed signals, averaged over the four illuminations. By usingthis averaged illumination spectrum as input for LU and LSD, we can compare methods for thesame delivered energy during the same time – giving no method an inherent SNR advantage. LSDwas also trained on in silico data using the same averaged illumination spectra from four simulatedilluminations.In addition to using the validation data set for hyperparameter tuning, we also qualitativelycompared a set of our measurements to Monte Carlo simulations of one of the validation phan-toms – creating an exact in silico representation of the light propagation in the validation phantom.Figure 7 serves as a qualitative (phantom to in silico ) comparison for some of the averaged illumi-nation spectra.We report the estimation error distributions on the three distinct test sets. Reported are rCuestimation errors ∆ rCu est = rCu est − rCu tube and their absolutes | ∆ rCu est | , with rCu tube being theground truth rCu in the tube. In the in silico test set all selected models are in close agreement.As shown in ﬁgure 8, both LSD and MI-LSD estimations with both RFs and NNs yield median Q absolute estimation errors of less than 3 percentage points (pp). With the lower 90 percentile P for MI-LSD showing fewer outliers and a potentially higher accuracy. As expected, estimation11 ig 7 Qualitative comparison between spectra in a validation phantom (for svf = 0) and its digital twin from MonteCarlo (MC) simulations, showing the effects of various spectral coloring on the mean illumination spectra. Relativecopper rCu tube is varied in the target tube (up-down) and the background medium rCu bg (left-right). For reference,linear unmixing (LU) rCu estimates are given for each spectrum. with all used models is very fast compared to LU. Inference on CPU for all the 266,105 samplesin the in silico test set took 1.6 s for RF MI-LSD, 1.3 s for RF LSD, 0.2 s for NN MI-LSD, 0.04 sfor NN LSD, compared to 642 s for LU. 12 ig 8 Error distributions of the in silico test set cf. ﬁgure 5. rCu estimation errors ∆ rCu est are shown left, theirabsolutes right. Blue shows the rCu estimators using multiple illumination learned spectral decoloring (MI-LSD),orange the estimators using learned spectral decoloring (LSD) and gray is the linear spectral unmixing (LU) reference.Medians Q of the error distributions are shown, together with interquartile ranges (IQR) and 90 percentiles P . Thetwo feed forward neural network (NN) models and the two random forest (RF) models all have median absolute errorsbelow 3 percentage points. Fig 9

Example region of interest (ROI) in the phantom test set B with the estimation results for MI-LSD, LSD andLU. Shown are the lower two tubes of a four tube phantom with svf = 0.5 % and rCu bg = 25 %. Mean PA signal isshown left, with the ground truth rCu tube annotated. The mean rCu estimate rCu over the ROI is noted for the threeestimators. From phantom test set B, tube signal was segmented by thresholding. In each reconstructedMI-multispectral PA image stack two ROIs were chosen, one containing the two upper tubes andone containing the lower two tubes. One such lower tubes ROI is shown in ﬁgure 9. Each ROIhas a ﬁxed size of 3.75 mm times 3.3 mm. The 15 % highest mean (over all wavelengths andilluminations) PA signal pixels in each ROI were segmented as tube and rCu was estimated from13 ig 10

Example regions of interest (ROI) 8 mm deep in the phantom test set C with estimation results for MI-LSD, LSD and LU. Shown are two times ﬁve imaged tubes with their rCu tube annotated above their mean PA signal.The mean rCu estimate rCu over the ROI is noted for the three estimators. The ROIs for two sets of phantoms areshown. A a representative result (rCu bg = 100 %, 1 % svf background), B a result with outlier estimation errors (0 %svf background). LSD has highest estimation errors in deep vessels and in phantoms with no added sulfates in thebackground medium, i.e. in B for rCu tube = 100 %. the MI-multispectral PA signals in all pixels of these tube signal areas. The ROIs were thresholdedseparately in order to get an equal number of lower tube samples into the test set. A thresholdingon the entire image or a larger ROI, using a lower cut-off percentage, would lead to more clutterand noise in the test set and the lower tubes being underrepresented in the test set.From phantom test set C, tube signal was segmented in a similar fashion: from each recon-structed MI-multispectral PA signal image stack a ROI of ﬁxed size (7.5 mm times 1.5 mm) wasselected, containing either the upper tube or the lower tube. Two such lower tubes example ROIsare shown in ﬁgure 10A&B for varying reference rCu tube . Within these ROIs the 50 % highest mean(over all wavelengths and illuminations) PA signal pixels were segmented as tube . rCu was thenestimated from the MI-multispectral PA signals in all pixels within these tube signal locations.The estimated rCu image examples from the test sets are shown for the RF models, becausewith the exception of the in silico test set, NN models performed similarly or worse than RFmodels. For all estimated rCu images from all models, see the supplemental ﬁgures. The errordistributions for phantom test set B are shown in ﬁgure 11 and for phantom test set C in ﬁgure 12.Descriptive statistics of the relative error distributions in the estimated rCu data are reported inTable 1 for the two phantom test sets B and C. The qualitative comparison of the absorbed energy spectra from the Monte Carlo simulations andthe phantom PA signal spectra reveals a general agreement between the simulations and the phan-tom results. The existing variations between the normalized spectra of the two domains are likelydue to discrepancies in the simulation e.g. the beam proﬁles and the the optical properties of the14 ig 11

Error distributions of the phantom test set B (cf. ﬁgure 4B). rCu estimation errors ∆ rCu est are shown left, theirabsolutes right. Blue shows the rCu estimators using multiple illumination learned spectral decoloring (MI-LSD),orange the estimators using learned spectral decoloring (LSD) and gray is the linear spectral unmixing (LU) reference.Medians Q of the error distributions are shown, together with interquartile ranges (IQR) and 90 percentiles P . Thefeed forward neural network (NN) models performed similar to the random forest (RF) models for the LSD methodbut underperformed for MI-LSD. set ∆ rCu est [pp] | ∆ rCu est | [pp]mean Q Q Q mean Q Q Q P RF MI-LSD B 0.6 -2.7 1.1 3.4 4.1 1.4 2.9 5.3 8.8C 1.8 -3.1 1.7 6.3 5.6 2.1 4.5 7.9 12.4LSD B -1.9 -4.4 0.2 2.2 5.2 1.5 3.3 6.2 10.7C -2.8 -5.3 0.6 2.6 7.1 1.7 3.9 7.9 13.7NN MI-LSD B -11.3 -18.4 -3.6 0.3 12.8 1.3 5.3 18.4 36.2C -21.0 -38.2 -12.0 0.1 22.0 2.2 12.0 38.2 58.1LSD B -3.1 -5.9 -0.3 1.8 6.4 1.1 3.4 8.1 16.7C -8.7 -15.1 -2.6 1.3 11.4 1.7 5.8 15.7 32.6LU B -1.2 -8.8 0.1 6.6 8.2 4.0 7.4 11.1 15.2C -1.0 -8.0 -0.5 6.3 8.7 3.4 7.2 12.5 18.2

Table 1

Relative rCu estimation errors( ∆ rCu est ) and absolute rCu estimation errors( | ∆ rCu est | ) for the random forests(RF), neural networks (NN) and linear unmixing. Mean, median Q , 1st and 3rd quartiles Q and Q , and the 90percentile P are listed for the phantom test sets B (transversal tubes) and C (longitudinal tubes). gel pad. The gel pad for example is currently simulated as water but also has some low-level scat-tering properties which was omitted in the simulation. Additionally, realistic laser noise was notsimulated and the phantom positioning was only accurate to a millimeter. An acoustic forwardsimulation (e.g. using k-wave) was also not included in the simulation pipeline due to computa-tional time constraints. While there are some acoustic artifacts (e.g. reﬂection artifacts) in the realPA image reconstructions, it is sensible to assume that they do not vary for different wavelengthillumination, therefore their effect on spectral coloring should be negligible. Variations in theGr¨uneisen parameter were also not part of the training set, even though it does vary signiﬁcantlywith rCu, because Γ( c wb ( NiSO )) ≈ . and Γ( c wb ( CuSO )) ≈ . at room temperature. This15 ig 12 . Error distributions of the phantom test set C (cf. ﬁgure 4C). rCu estimation errors ∆ rCu est are shown left, theirabsolutes right. Blue shows the rCu estimators using multiple illumination learned spectral decoloring (MI-LSD),orange the estimators using learned spectral decoloring (LSD) and gray is the linear spectral unmixing (LU) reference.Medians Q of the error distributions are shown, together with interquartile ranges (IQR) and 90 percentiles P . Thefeed forward neural network (NN) models performed similar to the random forest (RF) models for the LSD methodbut underperformed for MI-LSD. results in a systematically higher SNR for low rCu – an effect not present in sO , which mayexplain why high rCu estimations are systematically worse in all of our phantom test sets. Lasernoise levels are also wavelength dependent which is reinforced by the pulse energy correction –e.g. resulting in a factor two SNR when measuring at 800 nm compared to 680 nm.Our MI-LSD method with RF estimators was highly accurate with median absolute estimationerrors of only 2.9 and 4.5 percentage points in the two phantom test sets, respectively. Our NNmodels, however failed to give accurate estimates for MI-LSD. LSD estimates, using NN, wereonly improved over the LU reference and only in the phantom test set B. When testing on theout-of-distribution (OOD) test set C, our NN models showed no clear improvement over LU. Thisleads us to the initial conclusion that the overly complex NN models are prone to overﬁtting on the in silico data, even when optimizing their hyperparameters with simple phantom data. The attemptto remedy this with dropout layers, lead to overall inaccurate estimations.It is not surprising that the overall quantiﬁcation performance was worse in deeper tubes. SNRin 8 mm deep tubes was very low, i.e. the longer distance illuminations with 980 nm light oftenyielded no detectable PA signal. This is due to background water absorption in combination withthe high scattering, even when adding no sulfates to the background medium. We therefore alsoinvestigated omitting these higher wavelengths – training and testing with fewer wavelengths from680 nm to 920 nm, 20 nm spaced. This yielded obviously worse model performance overall, whicheither suggests that (these) 13 wavelengths are insufﬁcient for accurate estimation, or suggeststhat spectral coloring due to water absorption can be useful for a pixel wise correction of spectralcoloring, as it can give implicit information on the optical path length. For further investigation itmay be useful to add explicit information on the pixel position to the input features. It may alsobe interesting to perform similar experiments with a wider range of and/or more lower wavelengthmeasurements, and then optimize the wavelength selection on these oversampled multispectral16 ig 13 Worst estimation example in Phantom test set B: deep tubes ( B ) compared with more shallow tubes ( A ) inthe same phantom with a svf = 0 %. Estimation results for MI-LSD, LSD and LU. Mean PA signal is shown left,with the ground truth rCu tube annotated. The mean rCu estimate rCu over the ROI is noted for the three estimators.Shallow tubes can be estimated very accurately while lower SNR in deep, high rCu tubes correlates to poor estimationaccuracy. sequences. This was not done in this initial proof-of-concept work because it risks over-optimizingon unrealistic aspects of the rCu model (e.g. the difference in Gr ¨uneisen parameter of the twosulfate solutions) or setup speciﬁc aberrations (e.g. wavelength dependent SNR). Simulating morewavelengths also prolongs the already computationally expensive, 400 GPU days, simulation timefor the necessary training data.A ﬁnal somewhat surprising observation was that estimation of both LSD and to a lesser extentMI-LSD is poorest in phantoms with no added sulfates in the background medium. Figure 13Bshows the worst estimation results in the lower tubes of phantom test set B – combining threedetrimental circumstances: (1) great depth (2) high rCu and (3) only spectral coloring of water.Though even in this worst case, MI-LSD is more accurate than the LSD or LU estimations.One of the main shortcomings of the presented phantom validation is that it did not modelmelanin absorption in skin. Spectral coloring by melanin still causes large errors in standard ofcare pulse oximetry devices

39, 40 and needs to be addressed for qPAI. We were however not ableto reproducibly include a skin mimicking layer with varying melanin absorption in our liquidphantoms – future work will address this, by using solid, layered gel wax phantoms. We showed a proof-of-concept setup with comparably poor image quality due to the US DAQand transducer. An in vivo applicable system should make use of state of the art US components17nd further engineering improvements to sensitivity and SNR as this currently limits the achiev-able estimation accuracy. Wavelength selection and illumination geometry are, so far, largelyunoptimized. Lastly, while the rCu model is a very useful tool for the investigation and thoroughvalidation of a quantitative PA oximetry method, an explicit translation to actual sO estimation invivo must be the next step. We presented MI-LSD, a quantitative photoacoustic oximetry method using multiple illuminationsand machine learning; and presented a real-time MI PA imaging setup with a linear ultrasoundtransducer. We used this setup to image 115 phantom conﬁgurations by employing a highly reli-able, reproducible and easily scalable phantom model.MI-LSD with RFs was able to accurately and quickly estimate blood oxygen saturation mod-eled by copper and nickel sulfate. Compared to LU, MI-LSD approximately halved the absoluterelative estimation error, achieving median absolute estimation errors of only 2.9 and 4.5 percent-age points in our two phantom test sets, respectively. To investigate such ML regression methods,thorough phantom validation is critical, as in silico tests do not give sufﬁcient data to validate amethod, and in vivo measurements lack a reliable ground truth. This is further illustrated by thefact that previously reported LSD NN models, which were only validated on in silico data, slightlyoutperformed RF models on in silico data (as was previously reported) but underperformed RFmodels in phantom tests while simply breaking on out-of-distribution (OOD) phantom data.The results of this study give us a high degree of conﬁdence that the domain gap from insilico spectral decoloring to real data can be bridged using MI-LSD, paving the way to a clinicalapplication of quantitative photoacoustic oximetry imaging.

Disclosures

The authors have no relevant ﬁnancial interests in this article and no potential conﬂicts of interestto disclose.

Acknowledgments

This work has been funded by the Swiss National Science Foundation under project no. 179038.Most calculations were performed on UBELIX, the HPC cluster at the University of Bern. Wethank Michael Jaeger for his US data acquisition scripts, expertise and proof-reading; and AdrianJenk at the Institute of Applied Physics mechanical workshop for his mechanical support. For theircontinuous support of the open source Medical imaging interaction toolkit (MITK) and the ippaipython package, as well as for fruitful discussions we thank Janek Gr ¨ohl and the Photoacousticsteam at the Computer Assisted Medical Interventions division, German Cancer Research Center,Heidelberg.

Author Contributions

Conceptualization, T.K., M.F.; methodology, software, implementation, experiments, analysis,original draft preparation, T.K.; review and editing, T.K., M.F.; supervision, M.F.18 ode, Data, and Materials Availability

The code for the methods as well as the experiments was implemented in python 3.7 and is fullyopen source, available at github:thkirchner/PA-MI-LSD. All training, validation and test data setsgenerated in this work are openly available at doi:10.5281/zenodo.4549631. The raw Monte Carlosimulation results and raw PA scans are too large for upload – 3 TB – but available from the authorsupon reasonable request.

References et al. , “Quantitative spectroscopic photoacoustic imaging:a review,”

Journal of biomedical optics (6), 061202 (2012).2 B. T. Cox, S. R. Arridge, K. P. K ¨ostli, et al. , “Two-dimensional quantitative photoacoustic im-age reconstruction of absorption distributions in scattering media by use of a simple iterativemethod,” Applied Optics (8), 1866–1875 (2006).3 S. Tzoumas, A. Nunes, I. Oleﬁr, et al. , “Eigenspectra optoacoustic tomography achievesquantitative blood oxygenation imaging deep in tissues,” Nature communications , 12121(2016).4 V. Perekatova, P. Subochev, M. Y. Kirillin, et al. , “Fluence compensated optoacoustic mea-surements of blood oxygen saturation in vivo at two optimal wavelengths,” in Photons PlusUltrasound: Imaging and Sensing 2017 , , 100645K, International Society for Opticsand Photonics (2017).5 J. Glatz, N. C. Deliolanis, A. Buehler, et al. , “Blind source unmixing in multi-spectral optoa-coustic tomography,” Optics express (4), 3175–3184 (2011).6 L. Ulrich, K. G. Held, M. Jaeger, et al. , “Reliability assessment for blood oxygen saturationlevels measured with optoacoustic imaging,” Journal of biomedical optics (4), 046005(2020).7 L. Ulrich, L. Ahnen, H. G. Akarc¸ay, et al. , “Spectral correction for handheld optoacousticimaging by means of near-infrared optical tomography in reﬂection mode,” Journal of bio-photonics (1), e201800112 (2019).8 T. Kirchner, J. Gr ¨ohl, and L. Maier-Hein, “Context encoding enables machine learning-basedquantitative photoacoustics,” Journal of biomedical optics (5), 056008 (2018).9 G. P. Luke, K. Hoffer-Hawlik, A. C. Van Namen, et al. , “O-net: A convolutional neuralnetwork for quantitative photoacoustic image segmentation and oximetry,” arXiv preprintarXiv:1911.01935 (2019).10 C. Cai, K. Deng, C. Ma, et al. , “End-to-end deep neural network for optical inversion inquantitative photoacoustic imaging,” Optics letters (12), 2752–2755 (2018).11 C. Yang, H. Lan, H. Zhong, et al. , “Quantitative photoacoustic blood oxygenation imagingusing deep residual and recurrent neural network,” in , 741–744, IEEE (2019).12 D. A. Durairaj, S. Agrawal, K. Johnstonbaugh, et al. , “Unsupervised deep learning approachfor photoacoustic spectral unmixing,” in Photons Plus Ultrasound: Imaging and Sensing2020 , , 112403H, International Society for Optics and Photonics (2020).13 C. Bench, A. Hauptmann, and B. T. Cox, “Toward accurate quantitative photoacoustic imag-ing: learning vascular blood oxygen saturation in three dimensions,” Journal of BiomedicalOptics (8), 085003 (2020). 194 K. G. Held, M. Jaeger, J. Riˇcka, et al. , “Multiple irradiation sensing of the optical effectiveattenuation coefﬁcient for spectral correction in handheld oa imaging,” Photoacoustics (2),70–80 (2016).15 J. Gr¨ohl, T. Kirchner, T. J. Adler, et al. , “Learned spectral decoloring enables photoacousticoximetry,” Scientiﬁc Reports (accepted) (0), 0 (2021).16 P. Shao, B. Cox, and R. J. Zemp, “Estimating optical absorption, scattering, and grueneisendistributions with multiple-illumination photoacoustic tomography,” Applied optics (19),3145–3154 (2011).17 A. N. S. Institute, American National Standard for Safe Use of Lasers , Laser Institute ofAmerica (2007).18 M. Kim, G.-S. Jeng, M. O’Donnell, et al. , “Correction of wavelength-dependent laser ﬂuencein swept-beam spectroscopic photoacoustic imaging with a hand-held probe,”

Photoacoustics , 100192 (2020).19 J. Gr¨ohl, T. Kirchner, T. Adler, et al. , “Estimation of blood oxygenation with learnedspectral decoloring for quantitative photoacoustic imaging (lsd-qpai),” arXiv preprintarXiv:1902.05839 (2019).20 S. Tzoumas, A. Nunes, I. Oleﬁr, et al. , “Eigenspectra optoacoustic tomography achievesquantitative blood oxygenation imaging deep in tissues,” Nature Communications , 12121(2016).21 W. C. Vogt, X. Zhou, R. Andriani, et al. , “Photoacoustic oximetry imaging performanceevaluation using dynamic blood ﬂow phantoms with tunable oxygen saturation,” Biomedicaloptics express (2), 449–464 (2019).22 J. Laufer, C. Elwell, D. Delpy, et al. , “In vitro measurements of absolute blood oxygensaturation using pulsed near-infrared photoacoustic spectroscopy: accuracy and resolution,” Physics in Medicine & Biology (18), 4409 (2005).23 T. Mitcham, H. Taghavi, J. Long, et al. , “Photoacoustic-based so2 estimation through excisedbovine prostate tissue with interstitial light delivery,” Photoacoustics , 47–56 (2017).24 J. Buchmann, B. Kaplan, S. Powell, et al. , “Quantitative pa tomography of high resolution3-d images: experimental validation in a tissue phantom,” Photoacoustics , 100157 (2020).25 L. Wang and M. Xu, “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. (4),041101–22 (2006).26 T. Kirchner, J. Gr¨ohl, F. Sattler, et al. , “An open-source software platform for translationalphotoacoustic research and its application to motion-corrected blood oxygenation estima-tion,” arXiv preprint arXiv:1901.09781 (2019).27 M. B. Fonseca, L. An, and B. T. Cox, “Sulfates as chromophores for multiwavelength pho-toacoustic imaging phantoms,” Journal of Biomedical Optics (12), 125007 (2017).28 S. Prahl, Tabulated molar extinction coefﬁcient for hemoglobin in water , Oregon MedicalLaser Center, Portland, Tech. Rep (1998).29 R. Groenhuis, H. A. Ferwerda, and J. Ten Bosch, “Scattering and absorption of turbid materi-als determined from reﬂection measurements. 1: Theory,”

Applied optics (16), 2456–2462(1983).30 R. Cubeddu, A. Pifferi, P. Taroni, et al. , “Compact tissue oximeter based on dual-wavelengthmultichannel time-resolved reﬂectance,” Applied optics (16), 3670–3680 (1999).201 A. H. Hielscher, S. L. Jacques, L. Wang, et al. , “The inﬂuence of boundary conditions on theaccuracy of diffusion theory in time-resolved reﬂectance spectroscopy of biological tissues,” Physics in Medicine & Biology (11), 1957 (1995).32 S. L. Jacques, “Coupling 3d monte carlo light transport in optically heterogeneous tissues tophotoacoustic signal generation,” Photoacoustics (4), 137–142 (2014).33 D. J. Segelstein, The complex refractive index of water . PhD thesis, University of Missouri–Kansas City (1981).34 Q. Fang and D. A. Boas, “Monte carlo simulation of photon migration in 3d turbid mediaaccelerated by graphics processing units,”

Optics express (22), 20178–20190 (2009).35 S. Tzoumas and V. Ntziachristos, “Spectral unmixing techniques for optoacoustic imaging oftissue pathophysiology,” Philos Trans A Math Phys Eng Sci (2017).36 L. Breiman, “Random forests,”

Machine learning (1), 5–32 (2001).37 T. Kirchner, J. Gr ¨ohl, and L. Maier-Hein, “Context encoding enables machine learning-basedquantitative photoacoustics,” J Biomed Opt , 1–9 (2018).38 E. V. Savateeva, A. A. Karabutov, S. V. Solomatin, et al. , “Optical properties of blood atvarious levels of oxygenation studied by time-resolved detection of laser-induced pressureproﬁles,” in Biomedical Optoacoustics III , , 63–75, International Society for Optics andPhotonics (2002).39 J. R. Feiner, J. W. Severinghaus, and P. E. Bickler, “Dark skin decreases the accuracy ofpulse oximeters at low oxygen saturation: the effects of oximeter probe type and gender,” Anesthesia and Analgesia , S18–23, tables of contents (2007).40 P. E. Bickler, J. R. Feiner, and J. W. Severinghaus, “Effects of skin pigmentation on pulseoximeter accuracy at low saturation,”

The Journal of the American Society of Anesthesiolo-gists (4), 715–719 (2005).41 E. Maneas, W. Xia, O. Ogunlade, et al. , “Gel wax-based tissue-mimicking phantoms formultispectral photoacoustic imaging,”

Biomedical optics express (3), 1151–1163 (2018). List of Figures µ a spectra.3 Optical properties of the uncolored phantom background medium.4 Cross sections of the phantom sets.5 Cross sections of the phantom geometries.6 Examples of L1 normalized spectra.7 Qualitative comparison between a validation phantom and its digital twin.8 Error distributions of the in silicoin silico