Medical Image Quality Metrics for Foveated Model Observers
1 Medical Image Quality Metrics for Foveated Model Observers
Miguel A. Lago, a Craig K. Abbey, a Miguel P. Eckstein a,b* a Department of Psychological and Brain Sciences, University of California at Santa Barbara, Santa Barbara, CA 93106 USA b Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA 93106 USA
Abstract Purpose : A recently proposed model observer mimics the foveated nature of the human visual system by processing the entire image with varying spatial detail, executing eye movements and scrolling through slices. The model can predict how human search performance changes with signal type and modality (2D vs. 3D), yet its implementation is computationally expensive and time-consuming. Here, we evaluate various image quality metrics using extensions of the classic index of detectability expressions and assess foveated model observers for location-known exactly tasks.
Approach : We evaluated foveated extensions of a Channelized Hotelling and Non-prewhitening model with an eye filter. The proposed methods involve calculating a model index of detectability (d’) for each retinal eccentricity and combining these with a weighting function into a single detectability metric. We assessed different versions of the weighting function that varied in the required measurements of the human observers’ search (no measurements, eye movement patterns, and size of the image and median search times).
Results : We show that the index of detectability across eccentricities weighted using the eye movement patterns of observers best predicted human performance in 2D vs. 3D search performance for a small microcalcification-like signal and a larger mass-like. The metric with weighting function based on median search times was the second best at predicting human results.
Conclusions : The findings provide a set of model observer tools to evaluate image quality in the early stages of imaging system evaluation or design without implementing the more computationally complex foveated search model.
Keywords : model observers, psychophysics, visual search, 3D image modalities * Third Author , E-mail: [email protected] Introduction
For over four decades, researchers have worked on metrics of medical image quality. Some efforts have concentrated on physical measures of image quality that consider the relationship between the signal, noise, and the imaging system's modulation transfer function [1]. Researchers increasingly sought to incorporate the human visual system's properties to capture a radiologist's performance, visually detecting and classifying lesions. These mathematical formulations are known as model observers, which can be applied to statistical properties of the signals and backgrounds or the actual images and result in a performance measure detecting the signal [1]–[11]. Through four decades, the field of medical imaging has made progress in the development and evaluation of model observers. Early work concentrated on computer-generated backgrounds and simple detection tasks with a single signal appearing at one or a few locations (location-known exactly, LKE) [4], [6], [10]–[14]. With the advent of large scale digitalization of medical images in the mid-1990’s model observers were first applied to large samples of anatomical backgrounds extracted from x-ray coronary angiograms and mammograms [15]–[22]. Further developments included signals that could vary in size and shape [10], [20], [22]–[25], dynamic and/or 3D components [3], [26]–[33], and incorporation of properties of the human visual system including spatial frequency channels, internal noise, non-linear transducers and divisive normalization [11], [13], [34]. Still, a majority of studies have evaluated the validity of model observers with simple detection tasks for which the target might appear in a single or few known locations within the image [10], [20], [23], [30], [35]–[47]. The underlying assumption is that model observer performance for simple visual detection tasks with a few known locations reliably predicts performance with more clinically realistic tasks, which involve searching for the lesion across a larger region in the image. If so, conclusions from evaluations and optimizations of imaging systems with simple LKE tasks will be applicable and valid in more clinically realistic search tasks. For search in 2D images, researchers have used signal detection theory to quantitatively map performance across tasks with varying numbers of known possible signal locations in 2D images [12], [15], [48]–[50]. Investigators have also demonstrated shortcomings of the simple location-known exactly detection tasks and motivated the need to develop model observers that account for the visual search process [51]–[55]. Studies have also found discrepancies between the rank ordering of imaging systems based on LKE and search tasks [56], [57]. The dissociation between LKE tasks and search tasks becomes even more pronounced with 3D images. When searching with 3D images, observers typically do not exhaustively scrutinize through eye movement every region of each slice with the central region of the eye, the fovea. Such foveal region can process visual information with high spatial detail [58]–[60]. Instead, observers rely on processing much of the 3D image data with the visual periphery with lower spatial acuity. In contrast, for LKE tasks or even 2D search, observers scrutinize all possible target locations with the fovea. Studies have shown how the difference in how observers visually process information in LKE, 2D, and 3D images leads to dissociation in human performance [61]. Lago et al. [62], [63], have shown that, for 3D images, conventional model observers [3], [5], [8], [10], [11], [13], [16], [27], [30], [35], [38], [42], [45], [64] that do not take into account the degradation in processing in the human visual periphery are unable to predict detriments in performance for small signals in 3D search. They proposed a model (foveated search model) that combines elements of model observers for medical images and those of search from vision science [65]–[70]. The model processes the visual field with varying spatial detail based on the current fixation (point of gaze of the model), makes eye movements, scrolls through the slices, and integrates information across the search process to reach target present/absent decisions. The advantages of the model are that, unlike traditional model observers, it mimics humans in how they scrutinize images capturing aspects of human behavior such as explicit saccades and scrolls. This model is also able to predict the source of errors (search vs. recognition errors). The main disadvantage is that the model is computationally intensive and difficult to compute. It requires N times more dot products than a scanning observer [51], [52], [71], depending on the number of templates for the foveated model (N) and also some complementary information about the extent of the exploratory eye movement behavior of humans (e.g., percentage of the volume explored). Thus, our current goal is to investigate several possible simplifications using traditional calculations for signal detectability for signal known exactly/location known exactly [1], [2], [8], [42], [64] and relate them to the foveated search model’s predictions for 2D and 3D search. The approach requires a model that can predict humans' ability to detect the signal as a function of retinal eccentricity. If successful, these image quality metrics based on detectability calculations should provide a first pass approximation to the previously proposed full foveated search model (FSM). The metric s could potentially be used as a proxy to evaluate and optimize image quality without implementing the cumbersome FSM. We first present the theory and figures of merit based on the index of detectability for a SKE/LKE and their extensions for a foveated model. We then assess whether the figures of merit for the foveated model can predict human performance during search in with 2D and 3D search in 1/f filtered noise. In particular, we evaluate whether the index of detectability metrics for the foveated model (based on an extension of the Channelized Hotelling Observer [14], [26], [34], [35], [39], [40] and the Non-Prewhitening Model with Eye Filter [5], [10], [13], [45]) can correctly predict human: 1. the detriment of 3D search for a small sharp-edged signal that mimics a microcalcification (when compared to 2D search); 2. No detrimental influence of 3D images on performance for a larger mass-like signal. Theory
A majority of model observers are evaluated by taking the dot product between a pre-defined model template 𝐰 and data 𝐠 𝒊 at the possible target locations 𝑖 to result in a scalar response λ 𝑖 used to make decisions [1], [5], [8], [39], [72]: λ 𝑖 = 𝐰 t 𝐠 𝒊 (1) Models vary in the assumed templates which can take into account signal luminance properties [2], [10], [13], [41], [44], [45], noise statistical properties [4], [5], [73], [74], and constraints of the human visual system [14], [34], [72], [75]. To quantify search performance in a detection task, the decision variable ( λ ) of the model can be evaluated for target-present and target-absent trials to compute the hit rate, false-positive rate, and an area under the ROC curve by varying the decision threshold. Often times, when the decision variable is Gaussian distributed, performance is also quantified using a 𝑑 ′ index of detectability, which quantifies the difference in response to the signal vs. noise in standard deviation units [4], [64], [76]: 𝑑 ′ = λ 𝑠 − λ 𝑛 𝜎 𝜆 (2) When the model observer is linear, then, the index of detectability can be directly calculated from the template 𝐰 , the signal luminance profiles, and the covariance matrix 𝐊 . 𝑑 ′ = 𝐰 t 𝐬√𝐰 t 𝐊𝐰 (3) where the superscript t denotes transpose. Furthermore, when the noise is statistically stationary (the variance and covariance does not depend on location), 𝑑 ′ can be estimated from the Fourier domain: 𝑑 ′ = ∫ ∫ 𝑤(𝑢, 𝑣)̅̅̅̅̅̅̅̅̅̅𝑠(𝑢, 𝑣) +∞−∞ 𝑑𝑢 𝑑𝑣 +∞−∞ ∫ ∫ √|𝑤(𝑢, 𝑣)| 𝑁(𝑢, 𝑣)𝑑𝑢𝑑𝑣 +∞−∞+∞−∞ (4) where 𝑤(𝑢, 𝑣) is the template, 𝑠(𝑢, 𝑣) is the signal luminance, and
𝑁(𝑢, 𝑣) is the noise power spectrum. |𝑤(𝑢, 𝑣)| stands for the absolute value of the template and 𝑤(𝑢, 𝑣)̅̅̅̅̅̅̅̅̅̅ stands for the complex conjugate of the template. Index of detectability for each retinal eccentricity
The classic theoretical treatment of model observers, described in the previous section, considers a single template representing the central vision of the human observer. However, for many new applications, such as 3D images, the observer typically utilizes other parts of their visual field to scrutinize image data. A first step is to characterize a performance metric for different parts of the visual field. Instead of defining a single 𝑑 ′ , index of detectability, image quality can be characterized by a family of detectabilities spanning the visual field. As a first approximation, we assume that the deterioration in visual quality is anisotropic (rotationally invariant) as a function of distance from the fovea and represented by a distance of the signal location in visual degrees from central vision. The index of detectability can then be extended to 𝑑 𝐸′ : 𝑑 𝐸′ = 𝜆 𝑠,𝐸 − 𝜆 𝑛,𝐸 𝜎 𝜆 𝐸 (5) Each of the indices of detectability 𝑑 𝐸′ is calculated from the decision variables ( 𝜆 ) with each corresponding perceptual template for a given eccentricity 𝐰 𝐸 . The result is a collection of 𝑑 ′ s spanning different retinal eccentricities. Thus, equation 3 can be extended to: 𝑑 𝐸′ = 𝐰 𝐸t 𝐬√𝐰 𝐸t 𝐊 𝐸 𝐰 𝐸 (6) For statistically stationary noise, the Fourier expression of equation 4 can be generalized for different eccentricities and associated templates: 𝑑 𝐸′ = ∫ ∫ 𝑤 𝐸 (𝑢, 𝑣)̅̅̅̅̅̅̅̅̅̅̅𝑠(𝑢, 𝑣) +∞−∞ 𝑑𝑢 𝑑𝑣 +∞−∞ ∫ ∫ √|𝑤 𝐸 (𝑢, 𝑣)| 𝑁(𝑢, 𝑣)𝑑𝑢𝑑𝑣 +∞−∞+∞−∞ (7) where the terms are defined below equation 4. The previous section detailed how to calculate an index of detectability for each retinal eccentricity. The next step is to aggregate these indices into a single measure that can be used as a figure of merit to evaluate or optimize image quality for search. We can compute a weighted average across eccentricities: 〈𝑑 ′ 〉 = ∑ 𝑚 𝐸 𝑑 𝐸′𝑒E=0 (8) where 𝑚 𝐸 is a weight assigned to the 𝑑 𝐸′ at different eccentricities, and the summation extends from the fovea ( 𝐸 = 0 ) to the largest eccentricity in the image ( 𝑒 ). One possibility is trying to optimize a simple average ( 𝑚 𝐸 = 1 for all eccentricities) of all the 𝑑 ′ s across the entire visual field. This strategy would give equal importance to visual processing at all points in the visual field. Yet, it would not reflect the well-known fact that humans make three eye movements per second to fixate objects of interest [77], [78] and, even though they are able to process information in parallel [79], they predominantly use foveal information to make perceptual decisions. A second possibility is to use a weighting function that gives more prevalence to the foveal region and proportional to the normalized detectability ( ∑ norm(𝑑 𝐸′ ) 𝐸 = 1 ) at a given eccentricity, 𝑚 𝐸 = norm(𝑑 𝐸′ ) . In this case, the optimization would be over: 〈𝑑 ′ 〉 = ∑ norm(𝑑 𝐸′ )𝑑 𝐸′𝑒𝐸=0 (9) 𝑑 ′𝐸 estimated from eye movements measurements (ET closest fix) Arguably, the most accurate approach would be to have an estimate of the probability that the observer uses different parts of the visual field to process the signal during the process of visual search on each trial. Thus, the 𝑑 ′ on each trial will vary depending on the specific eye movements and the various eccentricities of the signal across the multiple fixations. For simplicity, we assume that the maximum attainable 𝑑 ′ for a trial is given by the minimum retinal eccentricity attained by the signal during the eye movement search process. The aggregate 𝑑 ′ across all trials can be calculated by weighting each 𝑑 𝐸′ by an estimate of the probability of the minimum retinal eccentricity of the signal attain a value E, 𝑚 𝐸 = 𝑝(𝐸 𝑚𝑖𝑛 ) . The effective 𝑑 ′ across trials is then estimated as: 〈𝑑 ′ 〉 = ∑ 𝑝(𝐸 𝑚𝑖𝑛 )𝑑 𝐸′𝑒𝐸=0 (10)
In practice, 𝑝(𝐸 𝑚𝑖𝑛 ) can be estimated empirically by using an eye tracker during the observers’ search. In this paper, we used the eye position measurements to calculate the closest fixation point to the signal location (in signal-present trials) and created a probability histogram of the distance distributions. Figure 1a shows three samples of eye movements from three different observers for the same trial. Solid lines represent saccades and fixations. The white circle is placed at the location of the signal. Using these fixations and the signal location, we can calculate the closest fixation point for each trial (dotted line) and estimate a probability distribution. Explicit knowledge of observers' eye movement behavior requires an eye tracker that is not always available to researchers and technology developers. Thus, we explore a simpler approximation to the distribution of the minimum retinal eccentricities of the signal. This method used the median fixation time from the eye tracker (ET) and the trial search time. We combined these numbers with the stimulus size (in degrees visual angle) to estimate a number of fixations per trial and infer the closest fixation to the signal. The method first estimates the number of fixations per trial from response time/fixation time. We used the average fixation time, 250 ms in 2D search and 500 ms in 3D search, and the average response times, which were 3.16 seconds for 2D and 22.62 seconds for 3D search. We also considered the number of scrolls per trial for the 3D search, which was 100 slices, divided the number of fixations per number of slices, and assumed a minimum of 1 fixation per slice. We then assumed that the fixations are distributed along equidistant points on a rectangular grid. We subsequently calculated the minimum distance to any fixations for a signal placed at each 𝑥 , 𝑦 locations in the image. 𝐷(𝑥, 𝑦) = min [√(𝑥 − 𝑥 𝑓𝑖𝑥,𝑗 ) + (𝑦 − 𝑦 𝑓𝑖𝑥,𝑗 ) ] 𝐽 (11) where 𝑥 𝑓𝑖𝑥,𝑗 and 𝑦 𝑓𝑖𝑥,𝑗 refer to the locations of the 𝑗 𝑡ℎ fixation, and min is the minimum function across the 𝐽 fixations. The probability of each minimum eccentricity can be calculated: 𝑚 𝐸 = 𝑝(〈𝐸 𝑚𝑖𝑛 〉) = 1𝑤ℎ ∑ ∑[𝐷(𝑥, 𝑦) = 𝐸 𝑚𝑖𝑛 ] ℎ𝑦𝑤𝑥 (12) where h and w stand for the height and width of the image and […] are Iverson brackets, defined as [Q] = 1 if Q is true and [Q] = 0 if Q is false. In practice, we bin minimum retinal eccentricities into various discrete categories resulting in a discrete number of eccentricities. Figure 1b shows a graphic representation of the minimum distance (grey value) as a function of signal position for various equidistant fixations. Dark values represent small distances to the fixation point, while brighter values represent larger distances. A fundamental component in calculating 𝑑 ′ as a function of retinal eccentricity is the estimation of the appropriate templates, 𝐰 𝐸 . Lago et al. [80], used a model in which perceptual templates decrease in spatial resolution with increasing distance away from the point of fixation (retinal eccentricity). For the classic Channelized Hotelling Observer (CHO), this is implemented by changing the channels as a function of eccentricity (foveated CHO, or FCHO). For the Non-Prewhitening model with Eye Filter (NPWE) the process entails varying the eye filter with eccentricity (foveated NPWE, or FNPWE). The 3D component was also constructed by stacking the 2D templates corresponding to each slice of the signal. Foveated Channelized Model Observer (FCHO)
For the FCHO model, we modified the Gabor channels as a function of the distance from the target to the point of fixation. Therefore, at the fovea (0 degrees of eccentricity), we use the original Gabor channels from the CHO standard model: 8 orientations and 6 spatial frequencies (16, 8, 4, 2, 1 and 0.5 cycles per degree). Then, as the distance increases, the size for all the Gabor channels is non-linearly scaled in respect to the eccentricity E in degrees of visual angle. 𝑠𝑐𝑎𝑙𝑖𝑛𝑔 = 1 + 𝛼E 𝛽 (13) where 𝛼 =0.7063, 𝛽 =1.6953 and K =2.7813. The scaling parameters were optimized to predict performance of the two studied signals as a function of retinal eccentricity (Lago et al., 2019 [80]). The central spatial frequencies of the Gabors decrease inversely to the scaling. That way, spatial frequencies (in cycles/degree) for eccentricity 1 dva would be: 9.3770, 4.6885, 2.3443, 1.1721, 0.5861, and 0.2930. For eccentricity 2 dva: 4.8672, 2.4336, 1.2168, 0.6084, 0.3042, and 0.1521. For eccentricity 3 dva: 2.8837, 1.4419, 0.7209, 0.3605, 0.1802, and 0.0901. And so on for further eccentricities. Additionally, Gabor channels with a frequency smaller than 0.15 cycles/degree are removed from the template due to their size being bigger than the image. Figure 2a shows how channels of a given foveal spatial frequency and orientation scales up with retinal eccentricity. The figure shows all eight orientations but only one spatial frequency. All 8 orientations are used for all eccentricities. These templates have access to fewer high spatial frequency-tuned channels at higher eccentricities, thus reducing their signal detection accuracy. Foveated Non-Prewhitening Model Observer with Eye Filter (FNPWE)
The foveated extension of the NPWE was implemented by changing the contrast sensitivity function, or eye filter, as a function of the distance from the fixation point to the signal. As the distance increases, the new contrast sensitivity function is calculated with respect to the eccentricity E in degrees of visual angle. ℰ(𝜌) = (𝜌E 𝑛 ) 𝛼 exp(−𝛽(𝜌E 𝑛 ) 𝛾 ) (14) Values for 𝛼 , 𝛽 , 𝛾 and 𝑛 were optimized to predict the human performance of the two signals as a function of retinal eccentricity ( 𝛼 =0.83, 𝛽 =0.35, 𝛾 =0.4, and 𝑛 =2.2). Figure 2b shows how the eye filter’s sensitivity to spatial frequencies varies with retinal eccentricity. The figure shows three retinal eccentricities and how the model’s access to high spatial frequencies diminishes with retinal eccentricity. Adjusting Internal Noise
The final decision variable accounted for additive internal noise sampled from a Gaussian distribution ϵ int ~𝒩(0, (K𝜎 𝜆 ) ) which standard deviation was proportional to the standard deviation of the model 𝜎 𝜆 and adds one fitting parameter K to the model. This parameter was fit to 2.78 for the FCHO model and to 15.13 for the FNPWE model. To validate the image quality metrics for the foveated model observers, we compared its performance prediction to that of the full implementation of the sample-driven Foveated Search Model (FSM). The complete FSM uses the different templates across eccentricities to calculate decision variables, explores the images through eye movements and scrolls, integrates decision variables for each location across the search process, and reaches final decisions about signal presence/absence. Here, we briefly discuss the FSM and refer to Lago et al. 2019 [80] for full details of the model. The FSM processes the entire image in parallel with the different templates given by the distance of the image region from the point of fixation. The template response at the coordinate, 𝑝 , is calculated by using a template for retinal eccentricity as determined by the distance of the image subregion, 𝐠 𝑝 , from the current fixation: λ 𝑝 = 𝐰 Et 𝐠 𝑝 + ϵ int (15) To construct these templates, we use the modified channelized hotelling observer described in section II.C.1. Once the templates are built, we assign an explicit fixation point, and we take the template's responses at the corresponding distance (in degrees of visual angle) to the given fixation point. Template responses are transformed to a likelihood ratio. Integration between fixations is made by multiplying the likelihood ratios of each fixation. The final decision takes the highest likelihood ratio for all fixations (and 3D scrolls). For this paper, the FSM uses human eye movements and 3D scrolls collected during the human observer experiments. The FSM is guided by the list of fixations/scrolls in the corresponding trial for each participant. This way, the model is processing the same fixations that the human observer was fixating. Materials and Methods
The 𝑑 ′ metrics above were evaluated for a 2D search and 3D search of two signal types with different sizes. The foveated model observers (FCHO and FNPWE) parameters were fit to a separate experiment that measured human detection of the signals as a function of retinal eccentricity in a location known exactly task. Here, we briefly describe the tasks but refer to Lago et al., 2021 [63] for more detail. Stimuli for the synthetic noise field were generated from 3D white noise fields ( 𝜇 = 128, 𝜎 = 25 ) filtered by a power spectrum ( 𝑓 −2.8 ) using frequency indexes. The noise field size was 1024×820×100 voxels. Two signals were generated: a small spherical sharp sphere (0.13 degrees of visual angle) that we refer as MCALC and a 3D Gaussian blob, that we refer to as MASS (3 standard deviations = 0.66 degrees of visual angle). Stimuli had a 50% chance of having one of the two targets (divided 50:50 between MCALC andMASS). The MASS signal was highly present on the lower range of spatial frequencies along with the background noise. The amplitude of the targets was set to be 83 gray levels over the background mean (128) at their peak. To generate the 2D trials, we selected the central slice of the signal (most information present), or a random slice on signal-absent trials. Seven undergraduate students at the University of California, Santa Barbara, participated in exchange for course credit. Observers sat at 75cm distance from a 1280x1024 resolution monitor in a darkened room. The monitor luminance was linearly calibrated between ~0 cd/m and 111 cd/m for gray levels 0 and 255, respectively. At the beginning of each trial, observers were asked to search for a specific target without a time limit. Trials were 2D and 3D intertwined. For the 3D search, observers were able to scroll freely using the mouse. A non-overlapping scroll bar was present on the right side of the screen. When observers pressed the spacebar, a question asking about the presence of the target was shown. Feedback was always given, showing the correct response. A real-time eye tracker (Eyelink 1000, SR Research Inc.) was recording fixations. We used the vendor's default parameters: eye velocity and acceleration thresholds of 30 degrees/sec and 9,500 degrees/sec , respectively. Figure 3 shows the timeline for one trial of this experiment. Participants were provided informed consent and treated according to the approved human subject research protocol by the University of California, Santa Barbara: 12-18-0025, 12-16-0806, and 12-15-0796. In order to know how the detectability of both signals decreases at increasing eccentricities, we ran a separate gaze-contingent experiment. Seven human observers participated in this study. Human observers were asked about the presence of a given signal (50% between MCALC or MASS) at a cued location while fixating at different distances. The signal was present in 50% of the trials. An eye tracker monitored that participants were not making an eye movement, in which case, the trial was discarded. Distances measured were 1, 3, and 6 degrees of visual angle for the MCALC and 0, 3, 6, and 9 degrees for the MASS. FCHO and FNPWE parameters were fit to human data from this experiment. To quantify the ability of the various models to predict human performance we computed a log-likelihood measure (Table 1) using the following formula:
𝐿𝐿 = −log [ 1√2𝜋𝜎 exp (− (ℎ𝑢𝑚𝑎𝑛 − 𝑚𝑜𝑑𝑒𝑙) )] (16) Results
We first present the result of fitting the FCHO and FNPWE models to the human data for the detection of the two signals (MCALC and MASS) as a function of retinal eccentricity (Figure 4a). A single set of model parameters were used to fit the curves for both signals simultaneously. Results show how detectability falls off more pronouncedly with eccentricity for the MCALC signal compared to the MASS signal.
To integrate the 𝑑 ′ indices into a single figure of merit, we considered various methods. Figure 4b shows the various weighting functions considered. The simplest weighting functions are the average 𝑑 𝐸′ and the normalized 𝑑 𝐸′ weighted. The other two schemes propose a weighting based on the probability distribution of the closest fixation to the signal (separately shown for the MCALC and MASS signals). The ET-closest fix approach uses actual fixation measurements from the observers to estimate the probabilities. An interesting finding is that the distribution of the closest fixation varies across signal types. For the MASS signal, the distribution of the closest fixation is similar across 2D and 3D search. In contrast, for the MCALC signal, the distribution is narrower for the 2D search and broader for the 3D search, suggesting that observers have difficulty guiding their eye movements and fixating the small signal in the 3D search. The last method to integrate the various 𝑑 𝐸′ , Time closest fixation, also relies on the closest fixation to the signal but approximates these distributions based on observer decision times and a simplifying assumption about the fixations. Figure 4b (right graph) shows the weighting function for this latter method. Performance predictions for the 2D and 3D search require taking the dot product of the 𝑑 𝐸′ function (Figure 4a) and the weighting function (Figure 4b, equation 8). Figure 5 shows human and estimated 𝑑′ performance using different weighting functions for 2D (left), and 3D (right) search. The top panel corresponds to the metrics based on the FCHO model, while the bottom panel to the FNPWE model. Both panels include human performance as measured by the empirical 𝑑′ calculated from seven observers. The panels also show the predicted 𝑑′ using the full implementation of a Foveated Search Model (FSM) that executes the same eye movements and scrolls of each human observer for the corresponding trial [62]. The pattern of results is similar across both model observers (FCHO and FNPWE). The average 𝑑 𝐸′ metric mispredicts the relative search performance for the signal in 2D search. The 𝑑 𝐸′ weighted method mispredicts the relative performance for the two signals for 3D search. All metrics based on the closest fixation provided a better approximation of humans' relative performance and the FSM model across conditions and signals. Table 1 shows the negative log-likelihood of observing the human data given each model. A smaller value suggests a better fit. The results confirm that the Discussion
Search in three-dimensional images requires observers to use the visual periphery to sample the slices through eye movements and can lead to performance dissociations with 2D search. In particular, small signals, difficult to detect in the visual periphery, result in these dissociations. Our results showing how the distribution of the closest fixation to the signal varies for the MCALC signal in 2D and 3D search illustrates this point. The finding is related to the smaller size and sharper edges of the MCALC, which are progressively filtered out with increasing eccentricity. The high spatial frequency information is particularly important for tasks in 1/f noise where the noise decreases with spatial frequency. For 2D search, a great portion of the search space can be covered by the fixational sampling, and results show that observers fixate the signal in 40% of the trials. For the 3D search, consisting of 100 slices, the observers only cover a small fraction of the regions and fixate the signal in only 15% of the trials. The Foveated Search Model (FSM) shows a strong agreement with human detectability for both MCALC and MASS in both tasks. However, the computational complexity of the full calculation of the Foveated Search Model (FSM) can sometimes reduce its usability. The proposed metrics presented in this paper are based on how the two signals are detected in the periphery (Figure 4a) and how that interacts with their detectability in 2D vs 3D search. The metrics are based on classic expressions for index of detectability commonly used in the medical image quality community [1], [2], [5], [7], [64], [72], [81] When combining the 𝑑′ indexes with a simple average, results are not different in 2D and 3D, and estimation is higher for the MASS in both (similar to human results in 3D). Additionally, when using a normalized 𝑑′ as the weighting function, results are relatively opposites: MCALC shows a higher 𝑑′ than MASS for both 2D and 3D (similar to human results in 2D). Finally, using an estimation of closest distances from both eye movements (ET closest) and trial times (Time closest) provides a 𝑑′ estimation that predicted human results (and FSM results) for both signals in 2D and 3D search. Although this estimation is still far from human results, it can explain possible dissociations in signal detectability in 2D and 3D search that is not captured by standard model observers. Table 2 summarizes the advantages and disadvantages of each of the methods presented in this paper and standard model observers in the literature [4], [5], [39], [81]–[83]. Conclusion
As with all modalities, evaluating 3D medical imaging systems is an important step for characterizing their performance and optimizing the many components that define them. However, model observers traditionally used for this purpose might not capture the complexities of human visual search in volumetric images. Foveated models like the FSM can be a solution in this case, but their computational cost is sometimes prohibitive. This paper presented two different ways to overcome this complexity by estimating the closest fixations during trials and combining it with the signal detectability at different retinal eccentricities. The methodologies might not provide quantitative predictions of human performance as accurate as a foveated search model. Still, they can highlight potential dissociations in signal detectability that point to the need for more investigation for a specific imaging system and thus a useful tool for engineers and medical physicists. Disclosures
This research was performed under an IRB protocol for human data (12-16-0806) approved by the University of California, Santa Barbara. The authors declare no conflict of interest.
Acknowledgments
The research was funded by the National Institute of Health grants R01 EB018958 and R01 EB026427. We thank the NIH RSNA Perception lab at the RSNA annual meeting for providing an excellent opportunity to conduct psychophysics experiments with radiologist participants.
References [1] H. H. Barrett and K. J. Myers,
Foundation of Image Science . John Wiley and Sons, 2004. [2] R. F. Wagner and Weaver K E, “An assortment of image quality indices for radiographic film-screen combinations- can they be resolved?,” in
Proc. SPIE , 1972, vol. 35, pp. 83–94. [3] I. Reiser and R. M. Nishikawa, “Task-based assessment of breast tomosynthesis: effect of acquisition parameters and quantum noise,”
Med. Phys. , vol. 37, no. 4, pp. 1591–1600, Apr. 2010. [4] H. H. Barrett, J. Yao, J. P. Rolland, and K. J. Myers, “Model observers for assessment of image quality,”
Proc. Natl. Acad. Sci. , vol. 90, no. 21, pp. 9758–9765, 1993. [5] A. Burgess, “Signal Detection in Radiology,”
The Handbook of Medical Image Perception and Techniques , Dec. 2018. /core/books/handbook-of-medical-image-perception-and-techniques/signal-detection-in-radiology/ABAFC2858D94FD09EAB67204978B2D6E/core-reader (accessed Nov. 07, 2019). [6] A. Burgess, R. Wagner, R. Jennings, and H. B. Barlow, “Efficiency of human visual signal discrimination,”
Science , vol. 214, no. 4516, pp. 93–94, 1981. [7] C. K. Abbey,
Observer Models as a Surrogate to Perception Experiments The Handbook of Medical Image Perception . New York: Cambridge University Press, 2010. [8] M. Eckstein, “A practical guide to model observers for visual detection in synthetic and natural noisy images,”
Phys. Psychophys. , 2000, Accessed: Dec. 27, 2018. [Online]. Available: https://ci.nii.ac.jp/naid/10024347085/. [9] C. Castella et al. , “Mass detection on mammograms: influence of signal shape uncertainty on human and model observers,”
J. Opt. Soc. Am. A , vol. 26, no. 2, pp. 425–436, Feb. 2009, doi: 10.1364/JOSAA.26.000425. [10] J. P. Rolland and H. H. Barrett, “Effect of random background inhomogeneity on observer detection performance,”
J. Opt. Soc. Am. A , vol. 9, no. 5, pp. 649–658, May 1992. [11] K. J. Myers, H. H. Barrett, M. C. Borgstrom, D. D. Patton, and G. W. Seeley, “Effect of noise correlation on detectability of disk signals in medical imaging,”
J. Opt. Soc. Am. A , vol. 2, no. 10, pp. 1752–1759, Oct. 1985.
18 [12] A. E. Burgess and H. Ghandeharian, “Visual signal detection. II. Signal-location identification,”
J. Opt. Soc. Am. A , vol. 1, no. 8, pp. 906–910, Aug. 1984. [13] A. Burgess, “Statistically defined backgrounds: performance of a modified nonprewhitening observer model,”
JOSA A , vol. 11, no. 4, pp. 1237–1242, 1994. [14] K. J. Myers and H. H. Barrett, “Addition of a channel mechanism to the ideal-observer model,”
JOSA A , vol. 4, no. 12, pp. 2447–2457, 1987. [15] M. P. Eckstein and J. S. Whiting, “Visual signal detection in structured backgrounds. I. Effect of number of possible spatial locations and signal contrast,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 13, no. 9, pp. 1777–1787, Sep. 1996. [16] H. H. Barrett, C. K. Abbey, and M. P. Eckstein, “Stabilized estimates of Hotelling observer detection performance in patient-structured noise,” vol. 3340, pp. 27–43, 1998. [17] M. P. Eckstein, “Human vs model observers in anatomic backgrounds,” in
Proceedings of SPIE , San Diego, CA, USA, 1998, pp. 16–26, doi: 10.1117/12.306180. [18] M. P. Eckstein, “Effect of image compression in model and human performance,” in
Proceedings of SPIE , San Diego, CA, USA, 1999, pp. 243–252, doi: 10.1117/12.349649. [19] A. E. Burgess, “Visual perception studies and observer models in medical imaging,” in
Seminars in nuclear medicine , 2011, vol. 41, pp. 419–436. [20] A. E. Burgess, F. L. Jacobson, and P. F. Judy, “Human observer detection experiments with mammograms and power-law noise,”
Med. Phys. , vol. 28, no. 4, pp. 419–437, Apr. 2001. [21] M. P. Eckstein, J. L. Bartroff, C. K. Abbey, J. S. Whiting, and F. O. Bochud, “Automated computer evaluation and optimization of image compression of x-ray coronary angiograms for signal known exactly detection tasks,”
Opt Express , vol. 11, no. 5, pp. 460–475, Mar. 2003, doi: 10.1364/OE.11.000460. [22] A. S. Chawla, E. Samei, R. Saunders, C. Abbey, and D. Delong, “Effect of dose reduction on the detection of mammographic lesions: a mathematical observer model analysis,”
Med. Phys. , vol. 34, no. 8, pp. 3385–3398, Aug. 2007. [23] Y. Zhang, B. T. Pham, and M. P. Eckstein, “Automated optimization of JPEG 2000 encoder options based on model observer performance for detecting variable signals in X-ray coronary angiograms,”
IEEE Trans. Med. Imaging , vol. 23, no. 4, pp. 459–474, Apr. 2004, doi: 10.1109/TMI.2004.824153. [24] B. Kim, M. Han, and J. Baek, “A Convolutional Neural Network-Based Anthropomorphic Model Observer for Signal Detection in Breast CT Images Without Human-Labeled Data,”
IEEE Access , vol. 8, pp. 162122–162131, 2020, doi: 10.1109/ACCESS.2020.3021125. [25] M. Han and J. Baek, “A convolutional neural network-based anthropomorphic model observer for signal-known-statistically and background-known-statistically detection tasks,”
Phys. Med. Biol. , Oct. 2020, doi: 10.1088/1361-6560/abbf9d. [26] L. Platiša et al. , “Channelized Hotelling observers for the assessment of volumetric imaging data sets,”
J Opt Soc Am A , vol. 28, no. 6, pp. 1145–1163, Jun. 2011, doi: 10.1364/JOSAA.28.001145. [27] A. Badano et al. , “Evaluation of Digital Breast Tomosynthesis as Replacement of Full-Field Digital Mammography Using an In Silico Imaging Trial,”
JAMA Netw Open , vol. 1(7):e185474, 2018. [28] R. Zeng, A. Badano, and K. Myers, “Optimization of digital breast tomosynthesis (DBT) acquisition parameters for human observers: effect of reconstruction algorithms,”
Phys. Med. Biol. , Feb. 2017, doi: 10.1088/1361-6560/aa5ddc. [29] C. Castella et al. , “Mass detection in breast tomosynthesis and digital mammography: a model observer study,” in
Proceedings of SPIE , Lake Buena Vista, FL, USA, 2009, pp. 72630O-72630O–10, doi: 10.1117/12.811131. [30] A. A. Sanchez, E. Y. Sidky, I. Reiser, and X. Pan, “Comparison of human and Hotelling observer performance for a fan-beam CT signal detection task,”
Med. Phys. , vol. 40, no. 3, p. 031104, Mar. 2013, doi: 10.1118/1.4789590. [31] R. Aufrichtig, P. Xue, C. W. Thomas, G. C. Gilmore, and D. L. Wilson, “Perceptual comparison of pulsed and continuous fluoroscopy,”
Med. Phys. , vol. 21, no. 2, pp. 245–256, 1994, doi: 10.1118/1.597285. [32] A. H. Ba et al. , “Anthropomorphic model observer performance in three-dimensional detection task for low-contrast computed tomography,”
J. Med. Imaging , vol. 3, no. 1, p. 011009, Dec. 2015, doi: 10.1117/1.JMI.3.1.011009. [33] M. P. Eckstein, J. S. Whiting, and J. P. Thomas, “Role of knowledge in human visual temporal integration in spatiotemporal noise,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 13, no. 10, pp. 1960–1968, Oct. 1996. [34] Y. Zhang, B. T. Pham, and M. P. Eckstein, “The effect of nonlinear human visual system components on performance of a channelized Hotelling observer in structured backgrounds,”
IEEE Trans. Med. Imaging , vol. 25, no. 10, pp. 1348–1362, Oct. 2006. [35] L. Yu et al. , “Correlation between a 2D channelized Hotelling observer and human observers in a low-contrast detection task with multislice reading in CT,”
Med. Phys. , vol. 44, no. 8, pp. 3990–3999, Aug. 2017, doi: 10.1002/mp.12380. [36] A. Ba et al. , “Inter-laboratory comparison of channelized hotelling observer computation,”
Med. Phys. , vol. 45, no. 7, pp. 3019–3030, 2018, doi: 10.1002/mp.12940. [37] M. Chen, J. E. Bowsher, A. H. Baydush, K. L. Gilland, D. M. DeLong, and R. J. Jaszczak, “Using the Hotelling observer on multi-slice and multi-view simulated SPECT myocardial images,” in
Nuclear Science Symposium Conference Record, 2001 IEEE , 2001, vol. 4, pp. 2258–2262, doi: 10.1109/NSSMIC.2001.1009273. [38] C. P. Favazza, K. A. Fetterly, N. J. Hangiandreou, S. Leng, and B. A. Schueler, “Implementation of a channelized Hotelling observer model to assess image quality of x-ray angiography systems,”
J. Med. Imaging , vol. 2, no. 1, p. 015503, Mar. 2015, doi: 10.1117/1.JMI.2.1.015503.
19 [39] X. He and S. Park, “Model observers in medical imaging research,”
Theranostics , vol. 3, no. 10, pp. 774–786, 2013. [40] A. E. Burgess, X. Li, and C. K. Abbey, “Visual signal detectability with two noise components: anomalous masking effects,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 14, no. 9, pp. 2420–2442, Sep. 1997. [41] C. K. Abbey and H. H. Barrett, “Human-and model-observer performance in ramp-spectrum noise: effects of regularization and object variability,”
JOSA A , vol. 18, no. 3, pp. 473–488, 2001. [42] C. K. Abbey, “Practical issues and methodology in assessment of image quality using model observers,” in
Proceedings of SPIE , Newport Beach, CA, USA, 1997, pp. 182–194, doi: 10.1117/12.273984. [43] Y. Zhang, B. Pham, and M. P. Eckstein, “Evaluation of JPEG 2000 encoder options: human and model observer detection of variable signals in X-ray coronary angiograms,”
IEEE Trans. Med. Imaging , vol. 23, no. 5, pp. 613–632, May 2004. [44] F. O. Bochud, C. K. Abbey, and M. P. Eckstein, “Visual signal detection in structured backgrounds. III. Calculation of figures of merit for model observers in statistically nonstationary backgrounds,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 17, no. 2, pp. 193–205, Feb. 2000. [45] M. P. Eckstein, C. K. Abbey, and F. O. Bochud, “Visual signal detection in structured backgrounds. IV. Figures of merit for model performance in multiple-alternative forced-choice detection tasks with correlated responses,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 17, no. 2, pp. 206–217, Feb. 2000. [46] S. Park, E. Clarkson, M. A. Kupinski, and H. H. Barrett, “Efficiency of the human observer detecting random signals in random backgrounds,”
JOSA A , vol. 22, no. 1, pp. 3–16, Jan. 2005, doi: 10.1364/JOSAA.22.000003. [47] S. Park, B. D. Gallas, A. Badano, N. A. Petrick, and K. J. Myers, “Efficiency of the human observer for detecting a Gaussian signal at a known location in non-Gaussian distributed lumpy backgrounds,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 24, no. 4, pp. 911–921, Apr. 2007, doi: 10.1364/josaa.24.000911. [48] F. O. Bochud, C. K. Abbey, and M. P. Eckstein, “Search for lesions in mammograms: statistical characterization of observer responses,”
Med. Phys. , vol. 31, no. 1, pp. 24–36, Jan. 2004. [49] X. He, F. Samuelson, R. Zeng, and B. Sahiner, “Discovering intrinsic properties of human observers’ visual search and mathematical observers’ scanning,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 31, no. 11, pp. 2495–2510, Nov. 2014. [50] R. G. Swensson and P. F. Judy, “Detection of noisy visual targets: models for the effects of spatial uncertainty and signal-to-noise ratio,”
Percept. Psychophys. , vol. 29, no. 6, pp. 521–534, Jun. 1981. [51] H. C. Gifford, “A visual-search model observer for multislice-multiview SPECT images,”
Med. Phys. , vol. 40, no. 9, p. 092505, Sep. 2013, doi: 10.1118/1.4818824. [52] H. C. Gifford, Z. Liang, and M. Das, “Visual-search observers for assessing tomographic x-ray image quality,”
Med. Phys. , vol. 43, no. 3, pp. 1563–1575, Mar. 2016, doi: 10.1118/1.4942485. [53] H. Gifford, “Efficient visual-search model observers for PET,”
Br. J. Radiol. , vol. 87, no. 1039, p. 20140017, 2014. [54] A. Sen and H. C. Gifford, “Accounting for anatomical noise in search-capable model observers for planar nuclear imaging,”
J. Med. Imaging , vol. 3, no. 1, pp. 015502–015502, 2016. [55] B. A. Lau, M. Das, and H. C. Gifford, “Towards Visual-Search Model Observers for Mass Detection in Breast Tomosynthesis,”
Proc. SPIE-- Int. Soc. Opt. Eng. , vol. 8668, Mar. 2013, doi: 10.1117/12.2008503. [56] X. He, F. W. Samuelson, R. Zeng, and B. Sahiner, “Three scenarios of ranking inconsistencies involving search tasks,” 2016, vol. 9787, pp. 97870U-97870U–8, doi: 10.1117/12.2217617. [57] M. A. Lago, C. K. Abbey, and M. P. Eckstein, “Foveated model observers to predict human performance in 3D images,” in
Proc.SPIE , 2017, vol. 10136, doi: 10.1117/12.2252952. [58] C. A. Curcio, K. R. Sloan, R. E. Kalina, and A. E. Hendrickson, “Human photoreceptor topography,”
J. Comp. Neurol. , vol. 292, no. 4, pp. 497–523, Feb. 1990, doi: 10.1002/cne.902920402. [59] R. Rosenholtz, “Capabilities and limitations of peripheral vision,”
Annu. Rev. Vis. Sci. , vol. 2, pp. 437–457, 2016. [60] M. P. Eckstein, “Visual search: A retrospective,”
J. Vis. , vol. 11, no. 5, 2011, doi: 10.1167/11.5.14. [61] M. P. Eckstein, M. A. Lago, and C. K. Abbey, “The role of extra-foveal processing in 3D imaging,” in
Proc.SPIE , 2017, vol. 10136, doi: 10.1117/12.2255879. [62] M. A. Lago, C. K. Abbey, and M. P. Eckstein, “Foveated Model Observers for Visual Search in 3D Medical Images,”
IEEE Trans. Med. Imaging , vol. PP, Dec. 2020, doi: 10.1109/TMI.2020.3044530. [63] M. A. Lago et al. , “Under-exploration of Three-Dimensional Images Leads to Search Errors for Small Salient Targets,”
Curr. Biol. CB , Jan. 2021, doi: 10.1016/j.cub.2020.12.029. [64] A. Burgess, “Signal Detection Theory: A Brief History,”
The Handbook of Medical Image Perception and Techniques , Dec. 2018. . [65] E. Akbas and M. Eckstein, “Object detection through search with a foveated search visual system,”
PLoS Comput. Biol. , no. in press, 2017. [66] G. J. Zelinsky, “A theory of eye movements during target acquisition,”
Psychol. Rev. , vol. 115, no. 4, pp. 787–835, Oct. 2008, doi: 10.1037/a0013118. [67] B. R. Beutter, M. P. Eckstein, and L. S. Stone, “Saccadic and perceptual performance in visual search tasks. I. Contrast detection and discrimination,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 20, no. 7, pp. 1341–1355, Jul. 2003. [68] P. Verghese, “Active search for multiple targets is inefficient,”
Vision Res. , Aug. 2012, doi: 10.1016/j.visres.2012.08.008. [69] M. P. Eckstein, W. Schoonveld, and S. Zhang, “Optimizing eye movements in search for rewards,”
J. Vis. , vol. 10, no. 7, p. 33, 2010, doi: 10.1167/10.7.33.
20 [70] J. Najemnik and W. S. Geisler, “Simple summation rule for optimal fixation selection in visual search,”
Vision Res. , vol. 49, no. 10, pp. 1286–1294, Jun. 2009, doi: 10.1016/j.visres.2008.12.005. [71] C. K. Abbey, F. W. Samuelson, R. Zeng, J. M. Boone, M. P. Eckstein, and K. Myers, “Classification images for localization performance in ramp-spectrum noise,”
Med. Phys. , vol. 45, no. 5, pp. 1970–1984, May 2018, doi: 10.1002/mp.12857. [72] C. K. Abbey, H. H. Barrett, and M. P. Eckstein, “Practical issues and methodolgy for using model observers as metrics of image quality,” vol. 3032, pp. 182–194, 1997. [73] Y. Zhang, C. K. Abbey, and M. P. Eckstein, “Adaptive detection mechanisms in globally statistically nonstationary-oriented noise,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 23, no. 7, pp. 1549–1558, Jul. 2006. [74] C. K. Abbey and M. P. Eckstein, “Classification images for simple detection and discrimination tasks in correlated noise,”
J. Opt. Soc. Am. A Opt. Image Sci. Vis. , vol. 24, no. 12, pp. B110-124, Dec. 2007. [75] Y. Zhang, “Evaluation of internal noise methods for Hotelling observers,” in
Proceedings of SPIE , San Diego, CA, USA, 2005, pp. 162–173, doi: 10.1117/12.595861. [76] D. M. Green and J. A. Swets,
Signal Detection Theory and Psychophysics . Peninsula Pub, 1989. [77] M. Hayhoe and D. Ballard, “Eye movements in natural behavior,”
Trends Cogn. Sci. , vol. 9, no. 4, pp. 188–194, Apr. 2005, doi: 16/j.tics.2005.02.009. [78] M. F. Land, “Vision, Eye Movements, and Natural Behavior,”
Vis. Neurosci. , vol. 26, no. 01, pp. 51–62, 2009, doi: 10.1017/S0952523808080899. [79] C. J. H. Ludwig, J. R. Davies, and M. P. Eckstein, “Foveal analysis and peripheral selection during active visual sampling,”
Proc. Natl. Acad. Sci. , p. 201313553, Jan. 2014, doi: 10.1073/pnas.1313553111. [80] M. A. Lago, C. K. Abbey, and M. P. Eckstein, “A foveated channelized Hotelling search model predicts dissociations in human performance in 2D and 3D images,” in
Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment , Mar. 2019, vol. 10952, p. 109520D, doi: 10.1117/12.2511777. [81] M. P. Eckstein, C. K. Abbey, and F. O. Bochud, “A practical guide to model observers for visual detection in synthetic and natural noisy images,”
Handb. Med. Imaging , vol. 1, pp. 593–628, 2000. [82] H. H. Barrett and K. J. Myers,
Foundations of image science . John Wiley & Sons, 2013. [83] E. Abadi et al. , “Virtual clinical trials in medical imaging: a review,”
J. Med. Imaging , vol. 7, no. 4, p. 042805, Apr. 2020, doi: 10.1117/1.JMI.7.4.042805. Figure 1. a) Samples of eye movements (solid lines) and target location (circle) used to calculate the closest fixation point (dotted lines) from the eye tracker samples. b) Method to estimate the distribution of minimum signal retinal eccentricities from response time, median fixation time, and display size. Examples are for estimation of 1 fixation, 6 fixations, and 15 fixations. Grey levels stand for the different distances to the closest fixation point for each pixel in the image.
Figure 2: a) Scaling of the Gabor channels for different eccentricities for the foveated channelized Hotelling observer. B) Sample of Contrast Sensitivity Filter (CSF) for different eccentricities for the foveated non-prewhitening model observer with eye filter (right). Fig. 3: Timeline of the task for 2D and 3D search for human observers.
Figure 4. a) Estimated d’ from the FCHO and FNPWE models (lines) at each eccentricity (degrees visual angle)along with human perceptual performance (circles) for both signals. b) Each graph is for a different method to aggregate d’ across eccentricities into a single figure of merit. Figure 5: Detectability index for human experiment in 2D (left) and 3D (right) and shorthand calculations (linear, average d’, d’-weighted, and closest fixation point) for both microcalcification (MCALC) and mass (MASS). Results are shown for both foveated channelized Hotelling observer (FCHO, squares) and the foveated non-prewhitening model observer with eye filter (FNPWE, triangles). Table 1: negative log-likelihoods between model observers and human observer data (ratio between MCALC and MASS for each model).
Avg. d’ d’-weighted ET closest Time closest FSM 2D Conditions Advantages Disadvantages Standard Model Observers
Well-established and computationally simple methodology to assess task performance Does not consider peripheral processing and erroneously predicts human performance of small signals in 3D search
Foveated Search Model
Excellent prediction of human performance in 3D search for different signals Separately quantifies search and recognition errors Computationally expensive and requires some knowledge of eye movement explorations using eye trackers
Foveated Model Observer with closest fixation approximation
Good approximation to the interaction of 3D search, eye movements, and signal’s visibility in the visual periphery Computationally simpler than foveated search model Requires an eye tracker to quantify eye movements and determine the closest fixation to signal Does not partition errors into search and recognition errors
Foveated Model Observer with decision time approximation