Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Haomiao Jiang is active.

Publication


Featured researches published by Haomiao Jiang.


IEEE Transactions on Image Processing | 2017

Learning the Image Processing Pipeline

Haomiao Jiang; Qiyuan Tian; Joyce E. Farrell; Brian A. Wandell

Many creative ideas are being proposed for image sensor designs, and these may be useful in applications ranging from consumer photography to computer vision. To understand and evaluate each new design, we must create a corresponding image processing pipeline that transforms the sensor data into a form, that is appropriate for the application. The need to design and optimize these pipelines is time-consuming and costly. We explain a method that combines machine learning and image systems simulation that automates the pipeline design. The approach is based on a new way of thinking of the image processing pipeline as a large collection of local linear filters. We illustrate how the method has been used to design pipelines for novel sensor architectures in consumer photography applications.


electronic imaging | 2017

Simulating retinal encoding: factors influencing Vernier acuity.

Haomiao Jiang; Nicolas P. Cottaris; James Golden; David H. Brainard; Joyce E. Farrell; Brian A. Wandell

Humans resolve the spatial alignment between two visual stimuli at a resolution that is substantially finer than the spacing between the foveal cones. In this paper, we analyze the factors that limit the information at the cone photoreceptors that is available to make these acuity judgments (Vernier acuity). We use open-source software, ISETBIO to quantify the stimulus and encoding stages in the front-end of the human visual system, starting with a description of the stimulus spectral radiance and a computational model that includes the physiological optics, inert ocular pigments, eye movements, photoreceptor sampling and absorptions. The simulations suggest that the visual system extracts the information available within the spatiotemporal pattern of photoreceptor absorptions within a small spatial (0.12 deg) and temporal (200 ms) regime. At typical display luminance levels, the variance arising from the Poisson absorptions and small eye movements (tremors and microsaccades) both appear to be critical limiting factors for Vernier acuity.


electronic imaging | 2015

Automatically designing an image processing pipeline for a five-band camera prototype using the local, linear, learned (L3) method

Qiyuan Tian; Henryk Blasinski; Steven Lansel; Haomiao Jiang; Munenori Fukunishi; Joyce E. Farrell; Brian A. Wandell

The development of an image processing pipeline for each new camera design can be time-consuming. To speed camera development, we developed a method named L3 (Local, Linear, Learned) that automatically creates an image processing pipeline for any design. In this paper, we describe how we used the L3 method to design and implement an image processing pipeline for a prototype camera with five color channels. The process includes calibrating and simulating the prototype, learning local linear transforms and accelerating the pipeline using graphics processing units (GPUs).


Journal of Vision | 2015

A spectral estimation method for predicting between-eye color matches in unilateral dichromats

Haomiao Jiang; Joyce E. Farrell; Brian A. Wandell

INTRODUCTION There are several reports describing color vision in subjects who are dichromatic in one eye and trichromatic in the other. Between-eye color matches in these unilateral dichromats have been used to model color appearance for dichromats (Brettel et al, 1997 ). METHODS We describe a theoretical principle that makes specific predictions about the mapping from two cone class absorptions in the dichromat eye to three cone class absorptions in the trichromatic eye. Specifically, we propose that the brain estimates a spectral power distribution consistent with the two measured cone absorption rates; we use this estimate to predict the equivalent absorption rate for the missing cone type. We examined the implications of different spectral estimation methods, which is a severely under constrained estimation problem. These include (a) a smoothness constraint, (b) non-negativity constraint, and (c) natural scene priors. We implemented these calculations in open-source software. RESULTS Under some assumptions (smoothness only), a single linear transformation converts the dichromatic cone absorptions to the estimate for the missing cone class. This transformation matches some, but not all, of the color matches in unilateral dichromats. Adding additional assumptions (non-negativity) results in a nonlinear relationship between the two measured cone class absorptions and the estimated absorptions for the missing cone class. This observation predicts which wavelengths of light will appear the same to the dichromatic and trichromatic eyes (isochromes). The non-negativity constraint improves the agreement between predictions and measurements in unilateral dichromats (Alpern et al, 1983). CONCLUSION Establishing a quantitative map from the two cone classes in a dichromat to a missing cone class has practical value for estimating color appearance matches between dichromats and trichromats (Brettel, et al.; Vischeck). In addition, we explain how the method can be useful for implementing a color difference metric for dichromatic observers. Meeting abstract presented at VSS 2015.


bioRxiv | 2018

A computational observer model of spatial contrast sensitivity: Effects of wavefront-based optics, cone mosaic structure, and inference engine

Nicolas P. Cottaris; Haomiao Jiang; Xiaomao Ding; Brian A. Wandell; David H. Brainard

We present a computational observer model of the human spatial contrast sensitivity (CSF) function based on the Image Systems EngineeringTools for Biology (ISETBio) simulation framework. We demonstrate that ISETBio-derived CSFs agree well with CSFs derived using traditional ideal observer approaches, when the mosaic, optics, and inference engine are matched. Further simulations extend earlier work by considering more realistic cone mosaics, more recent measurements of human physiological optics, and the effect of varying the inference engine used to link visual representations to psy-chohysical performance. Relative to earlier calculations, our simulations show that the spatial structure of realistic cone mosaics reduces upper bounds on performance at low spatial frequencies, whereas realistic optics derived from modern wavefront measurements lead to increased upper bounds high spatial frequencies. Finally, we demonstrate that the type of inference engine used has a substantial effect on the absolute level of predicted performance. Indeed, the performance gap between an ideal observer with exact knowledge of the relevant signals and human observers is greatly reduced when the inference engine has to learn aspects of the visual task. ISETBio-derived estimates of stimulus representations at different stages along the visual pathway provide a powerful tool for computing the limits of human performance.


bioRxiv | 2018

Computational-Observer Analysis of Illumination Discrimination

Xiaomao Ding; Ana Radonjić; Nicolas P. Cottaris; Haomiao Jiang; Brian A. Wandell; David H. Brainard

The spectral properties of the ambient illumination provide useful information about time of day and weather. We study the perceptual representation of illumination by analyzing measurements of how well people discriminate between illuminations across scene configurations. More specifically, we compare human performance to a computational-observer analysis that evaluates the information available in the isomerizations of the cones in a model human photoreceptor mosaic. Some patterns of human performance are predicted by the computational observer, other aspects are not. The analysis clarifies which aspects of performance require additional explanation in terms of the action of visual mechanisms beyond the isomerization of light by the cones.


Archive | 2017

7 Characterization of visual stimuli using the standard display model

Joyce E. Farrell; Haomiao Jiang; Brian A. Wandell

Visual psychophysics advances by experiments that measure how sensations and perceptions arise from carefully controlled visual stimuli. Progress depends in large part on the type of display technology that is available to generate stimuli. In this chapter, we first describe the strengths and limitations of the display technologies that are currently used to study human vision. We then describe a standard display model that guides the calibration and characterization of visual stimuli on these displays (Brainard et al., 2002; Post, 1992). We illustrate how to use the standard display model to specify the spatial–spectral radiance of any stimulus rendered on a calibrated display. This model can be used by engineers to assess the trade-offs in display design and by scientists to specify stimuli so that others can replicate experimental measurements and develop computational models that begin with a physically accurate description of the experimental stimulus. 7.2 DISPLAY TECHNOLOGIES FOR VISION SCIENCE


electronic imaging | 2016

A Spectral Estimation Theory for Color Appearance Matching

Haomiao Jiang; Joyce E. Farrell; Brian A. Wandell

There are several reports describing color vision in subjects who are dichromatic in one eye and trichromatic in the other. Formulae fit to the between-eye color appearance matches in such unilateral dichromats have been used to predict color appearance for dichromats [1][2]. In this paper, we describe a general principle, spectral estimation theory, that guides how to predict the mapping from two cone class absorptions in a dichromatic eye to three cone class absorptions in a trichromatic eye. The theory predicts matches by first estimating the smoothest, non-negative, spectral power distribution that is consistent with the measured cone absorption rates (e.g., of a dichromat). We then use this spectral estimate to calculate the absorption rate for the missing cone type (the standard color observer). In addition to predicting color appearance of dichromats, the theory predicts color appearance matches between color anomalous subjects and the standard color observer. Finally, the theory offers guidance about the possible effects on color appearance of gene therapy treatment for colorblindness. Introduction Many students of color vision have asked how the color experienced by one person may appear to another person. A fascinating series of papers describe one attempt to answer this question by measuring between-eye color matches measured in unilateral dichromats: these people are dichromatic in one eye and trichromatic in the other. The between-eye color matching experiments were first reported by Judd et al. [3] and subsequently by Graham et al. [4] and Alpern et al. [5]. Judd et al. [3] reported that equal energy light and two narrow band lights at 475 nm and 575 nm look the same when presented to both eyes of a unilateral protanope and a unilateral deuteranope. Alpern et al. [5] conducted a similar experiment with a unilateral tritanope and reported three eigen-colors: equal energy light, narrow-band light at 485 and 660. [5] reported We refer to lights whose color appearance match between the two eyes of a unilateral dichromat as eigen-colors. In 1995, Vienot et al. [1] used the data from these unilateral dichromats to calculate the equivalent color appearance in a standard color observer. Vienot et al. [1] and later Brettel et al. [2] noted that eigen-colors define an empirical map between two cone coordinates in a dichromatic eye and three cone coordinates in the trichromatic eye. They fit a piecewise linear relationship from the measured to missing cone types using the eigen-colors.


electronic imaging | 2016

Local Linear Approximation for Camera Image Processing Pipelines.

Haomiao Jiang; Qiyuan Tian; Joyce E. Farrell; Brian A. Wandell

Modern digital cameras include an image processing pipeline that converts raw sensor data to a rendered RGB image. Several key steps in the pipeline operate on spatially localized data (demosaicking, noise reduction, color conversion). We show how to derive a collection of local, adaptive linear filters (kernels) that can be applied to each pixel and its neighborhood; the adaptive linear calculation approximates the performance of the modules in the conventional image processing pipeline. We also derive a set of kernels from images rendered by expert photographers. In both cases, we evaluate the accuracy of the approximation by calculating the difference between the images rendered by the camera pipeline with the images rendered by the local, linear approximation. The local, linear and learned (L3) kernels approximate the camera and expert processing pipelines with a mean S-CIELAB error of ∆E < 2. A value of the local and linear architecture is that the parallel application of a large number of linear kernels works well on modern hardware configurations and can be implemented efficiently with respect to power. Introduction The image processing pipeline in a modern camera is composed of serially aligned modules, including dead pixel removal, demosaicing, sensor color conversion, denoising, illuminant correction and other components (e.g., sharpening or hue enhancement). To optimize the rendered image, researchers designed and optimized the algorithms for each module and added new modules to handle different corner cases. The majority of commercial camera image processing pipelines consist of a collection of these specialized modules that are optimized for one color filter array design Bayer pattern (one red, one blue and two green pixels in one repeating pattern). New capabilities in optics and CMOS sensors have make it possible to design novel sensor architectures that promise to offer features that extend the original Bayer RGB sensor design. For example, recent years have produced a new generation of architectures to increase spatial resolution [1], control depth of field through light field camera designs (Lytro, Pelican Imaging, Light.co), extend dynamic range and sensitivity by the use of novel arrangements of color filters [2-5] and mixed pixel architectures [6]. There is a need to define an efficient process for building image rendering pipelines that can be applied to each of the new designs. In 2011, Lansel et al. [7] proposed an image processing pipeline that efficiently combines several key modules into one computational step, and whose parameters can be optimized using automated learning methods [8-10]. This pipeline maps raw sensor values into display values using a set of local, linear and learned filters, and thus we refer to it as the L3 method. The kernels for the L3 pipeline can be optimized using simple statistical methods. The L3 algorithm automates the design of key modules in the imaging pipeline for a given sensor and optics. The learning method can be applied to both Bayer and non-Bayer color filter arrays and to systems that use a variety of optics. We illustrated the method using both simulations [10] and real experimental data from a five-band camera prototype [9]. Computationally, the L3 algorithm relies mainly on a large set of inner products, which can be efficient and low power [11]. The L3 algorithm is part of a broader literature that explores how to incorporate new optimization methods into the image processing pipeline. For example, Stork and Robinson [12] developed a method for jointly designing the optics, sensor and image processing pipeline for an imaging system. Their optimization focused on the design parameters of the lens and sensor. Khabashi et al. [13] propose using simulation methods and Regression Tree Fields to design critical portions of the image processing pipeline. Heide et al. [14] have proposed that the image processing pipeline should be conceived of as a single, integrated computation that can be solved using modern optimization methods as an inverse problem. Instead of applying different heuristics for the separate stages of the traditional pipeline (demosaicing, denoising, color conversion), they rely on image priors and regularizers. Heide and colleagues [14, 15] use modern optimization methods and convolutional sparse coding to develop image pipelines as well as to address the more general image processing techniques, such as inpainting. The distinctive emphasis of the L3 method is how it couples statistical learning methods with a simple computational architecture to create new pipelines that are efficient for use on modern mobile devices. Here we identify two new applications of the L3 pipeline. First, we show that the L3 pipeline can learn to approximate other highly optimized image processing pipelines. We demonstrate this by comparing the L3 pipeline with the rendering from a very high quality digital camera. Second, we show that the method can learn a pipeline that is created as the personal preferences of individual users. We demonstrate this by arranging for the L3 pipeline to learn the transformations applied by a highly skilled photographer. Proposed Method: Local Linear and Learned In our previous work, we used image systems simulation to design a pipeline for novel camera architectures [9, 10]. We cre©2016 Society for Imaging Science and Technology DOI: 10.2352/ISSN.2470-1173.2016.18.DPMI-248 IS&T International Symposium on Electronic Imaging 2016 Digital Photography and Mobile Imaging XII DPMI-248.1 ated synthetic scenes and camera simulations to create sensor responses and the ideal rendered images. We used these matched pairs to define sensor response classes where the transformation from the sensor response to the desired rendered image could be well-approximated by an affine transformation. The L3 parameters define the classes, Ci, and the transformations from the sensor data to the rendered output for each class, Ti. We use the same L3 principles to design an algorithm that learns the linear filters for each class from an existing pipeline. This application does not require camera simulations; instead, we can directly learn the L3 parameters using the sensor output and corresponding rendered images. The rendered images can be those produced by the camera vendor, or they can be images generated by the user. The proposed method consists of two independent modules: 1) learning local linear kernels from raw image and corresponding rendered RGB image 2) rendering new raw images into desired RGB output. The learning phase is conducted once for one camera model, and the kernels are stored for future rendering. The rendering process is efficient as it involves loading the class definitions and kernels and applying them to generate the output images. Kernel Learning In general, our task is to find for each class a P× 3 linear transformation (kernel), Ti such that argminTi ∑ j∈Ci L(y j,X jTi) Here, X j, y j are the jth example data set from the RAW sensor data and the rendered RGB image values for class i. The function L specifies the loss function (visual error). In commercial imaging applications, the visual difference measure in CIE ∆Eab can be a good choice for the loss function. In image processing applications, the transformation from sensor to rendered data is globally non-linear. But, as we show here the global transformation can be well approximated as an affine transform for appropriately defined classes Ci. When the classes Ci are determined, the transforms can be solved for each class independently. The problem can be expressed in the form of ordinary least-squares. To avoid noise magnification in low light situations, we use ridge regression and regularize the kernel coefficients. That is Ti = argmin||ỹ−XTi||2 +λ ||Ti||2 Here, λ is the regularization parameter, and y is the output in the target color space as a N×3 matrix. The sensor data in each local patch is re-organized as rows in X . There are P columns, corresponding to the number of pixels in the sensor patch. The closed-form solution for this problem is given as Ti = (XT X +λ I)−1XT ỹ The computation of Ti can be further optimized by using singular vector decomposition (SVD) of X . That is, if we decompose X =UDV T , we have Ti =V ×diag ( D j Dj +λ ) UT ỹ The regularization parameter (λ ) is chosen to minimize the generalized cross-validation (GCV) error [16]. We performed these calculations using several different target color spaces, including both the CIELAB and sRGB representations. Patch Classification To solve the transforms Ti, the Ci classes must be defined. The essential requirement for choosing classes is that the sensor data in the class can be accurately transformed to the response space. This can always be achieved by increasing the number of classes (i.e., shrinking the size of the class). In our experience, it is possible to achieve good local linearity by defining classes according to their mean response level, contrast, and saturation. Mean channel response estimates the illuminance at the sensor and codes the noise. Contrast measures the local spatial variation, reflecting flat/texture property of the scene. Finally, saturation type checks for the case in which some of the channels no longer provide useful information. It is particularly important to separate classes with channel saturation.


Rundbrief Der Gi-fachgruppe 5.10 Informationssystem-architekturen | 2015

ISETBIO: Computational tools for modeling early human vision

David H. Brainard; Haomiao Jiang; Nicolas P. Cottaris; Fred Rieke; E. J. Chichilnisky; Joyce E. Farrell; Brian A. Wandell

Vision begins with the formation of the retinal image, sampling by photoreceptors, and retinal processing of photoreceptor excitations. We describe software tools for modeling how these fundamental processes limit human performance on visual tasks. Article not available.

Collaboration


Dive into the Haomiao Jiang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David H. Brainard

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaomao Ding

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Ana Radonjić

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge