Scott Cohen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Scott Cohen is active.

Explore More

Publication

Featured researches published by Scott Cohen.

Computers in Physics | 1996

CVODE, a stiff/nonstiff ODE solver in C

Scott Cohen; Alan C. Hindmarsh

CVODE is a package written in C for solving initial value problems for ordinary di erential equations. It provides the capabilities of two older Fortran packages, VODE and VODPK. CVODE solves both sti and nonsti systems, using variable-coe cient Adams and BDF methods. In the sti case, options for treating the Jacobian of the system include dense and band matrix solvers, and a preconditioned Krylov (iterative) solver. In the highly modular organization of CVODE, the core integrator module is independent of the linear system solvers, and all operations on N -vectors are isolated in a module of vector kernels. A set of parallel extenstions of CVODE, called PVODE, is being developed. CVODE is available from Netlib, and comes with an extensive user guide.

international conference on computer vision | 1999

The Earth Mover's Distance under transformation sets

Scott Cohen; L. Guibasm

The Earth Movers Distance (EMD) is a distance measure between distributions with applications in image retrieval and matching. We consider the problem of computing a transformation of one distribution which minimizes its EMD to another. The applications discussed here include estimation of the size at which a color pattern occurs in an image, lighting-invariant object recognition, and point feature matching in stereo image pairs. We present a monotonically convergent iteration which can be applied to a large class of EMD under transformation problems, although the iteration may converge to only a locally optimal transformation. We also provide algorithms that are guaranteed to compute a globally optimal transformation for a few specific problems, including some EMD under translation problems.

international conference on computer vision | 2015

Joint Object and Part Segmentation Using Deep Learned Potentials

Peng Wang; Xiaohui Shen; Zhe L. Lin; Scott Cohen; Brian L. Price; Alan L. Yuille

Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision. In this paper, we propose a joint solution that tackles semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentation, and more detailed part-level localization is utilized to refine object segmentation. Specifically, we first introduce the concept of semantic compositional parts (SCP) in which similar semantic parts are grouped and shared among different objects. A two-stream fully convolutional network (FCN) is then trained to provide the SCP and object potentials at each pixel. At the same time, a compact set of segments can also be obtained from the SCP predictions of the network. Given the potentials and the generated segments, in order to explore long-range context, we finally construct an efficient fully connected conditional random field (FCRF) to jointly predict the final object and part labels. Extensive evaluation on three different datasets shows that our approach can mutually enhance the performance of object and part segmentation, and outperforms the current state-of-the-art on both tasks.

computer vision and pattern recognition | 2017

Relationship Proposal Networks

Ji Zhang; Mohamed Elhoseiny; Scott Cohen; Walter Chang; Ahmed M. Elgammal

Image scene understanding requires learning the relationships between objects in the scene. A scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, only a handful of people might be speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs, not only is the number of all pairs quadratic, but classification requires limited object categories, which is not scalable for real-world images. In this paper we address these challenges by using pairs of related regions in images to train a relationship proposer that at test time produces a manageable number of related regions. We name our model the Relationship Proposal Network (Rel-PN). Like object proposals, our Rel-PN is class-agnostic and thus scalable to an open vocabulary of objects. We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals. We demonstrate its performance on the Visual Genome dataset and compare to other baselines that we designed. We also conduct experiments on a smaller subset of 5,000 images with over 37,000 related regions and show promising results.

computer vision and pattern recognition | 2016

Interactive Segmentation on RGBD Images via Cue Selection

Jie Feng; Brian L. Price; Scott Cohen; Shih-Fu Chang

Interactive image segmentation is an important problem in computer vision with many applications including image editing, object recognition and image retrieval. Most existing interactive segmentation methods only operate on color images. Until recently, very few works have been proposed to leverage depth information from low-cost sensors to improve interactive segmentation. While these methods achieve better results than color-based methods, they are still limited in either using depth as an additional color channel or simply combining depth with color in a linear way. We propose a novel interactive segmentation algorithm which can incorporate multiple feature cues like color, depth, and normals in an unified graph cut framework to leverage these cues more effectively. A key contribution of our method is that it automatically selects a single cue to be used at each pixel, based on the intuition that only one cue is necessary to determine the segmentation label locally. This is achieved by optimizing over both segmentation labels and cue labels, using terms designed to decide where both the segmentation and label cues should change. Our algorithm thus produces not only the segmentation mask but also a cue label map that indicates where each cue contributes to the final result. Extensive experiments on five large scale RGBD datasets show that our proposed algorithm performs significantly better than both other color-based and RGBD based algorithms in reducing the amount of user inputs as well as increasing segmentation accuracy.

computer vision and pattern recognition | 2017

Depth from Defocus in the Wild

Huixuan Tang; Scott Cohen; Brian L. Price; Stephen N. Schiller; Kiriakos N. Kutulakos

We consider the problem of two-frame depth from defocus in conditions unsuitable for existing methods yet typical of everyday photography: a handheld cellphone camera, a small aperture, a non-stationary scene and sparse surface texture. Our approach combines a global analysis of image content—3D surfaces, deformations, figure-ground relations, textures—with local estimation of joint depth-flow likelihoods in tiny patches. To enable local estimation we (1) derive novel defocus-equalization filters that induce brightness constancy across frames and (2) impose a tight upper bound on defocus blur—just three pixels in radius—through an appropriate choice of the second frame. For global analysis we use a novel piecewise-spline scene representation that can propagate depth and flow across large irregularly-shaped regions. Our experiments show that this combination preserves sharp boundaries and yields good depth and flow maps in the face of significant noise, uncertainty, non-rigidity, and data sparsity.

computer vision and pattern recognition | 2016

Two Illuminant Estimation and User Correction Preference

Dongliang Cheng; Abdelrahman Kamel; Brian L. Price; Scott Cohen; Michael S. Brown

This paper examines the problem of white-balance correction when a scene contains two illuminations. This is a two step process: 1) estimate the two illuminants, and 2) correct the image. Existing methods attempt to estimate a spatially varying illumination map, however, results are error prone and the resulting illumination maps are too lowresolution to be used for proper spatially varying whitebalance correction. In addition, the spatially varying nature of these methods make them computationally intensive. We show that this problem can be effectively addressed by not attempting to obtain a spatially varying illumination map, but instead by performing illumination estimation on large sub-regions of the image. Our approach is able to detect when distinct illuminations are present in the image and accurately measure these illuminants. Since our proposed strategy is not suitable for spatially varying image correction, a user study is performed to see if there is a preference for how the image should be corrected when two illuminants are present, but only a global correction can be applied. The user study shows that when the illuminations are distinct, there is a preference for the outdoor illumination to be corrected resulting in warmer final result. We use these collective findings to demonstrate an effective two illuminant estimation scheme that produces corrected images that users prefer.

computer vision and pattern recognition | 2017

Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition

Yufei Wang; Zhe Lin; Xiaohui Shen; Scott Cohen; Garrison W. Cottrell

Recently, there has been a lot of interest in automatically generating descriptions for an image. Most existing language-model based approaches for this task learn to generate an image description word by word in its original word order. However, for humans, it is more natural to locate the objects and their relationships first, and then elaborate on each object, describing notable attributes. We present a coarse-to-fine method that decomposes the original image description into a skeleton sentence and its attributes, and generates the skeleton sentence and attribute phrases separately. By this decomposition, our method can generate more accurate and novel descriptions than the previous state-of-the-art. Experimental results on the MS-COCO and a larger scale Stock3M datasets show that our algorithm yields consistent improvements across different evaluation metrics, especially on the SPICE metric, which has much higher correlation with human ratings than the conventional metrics. Furthermore, our algorithm can generate descriptions with varied length, benefiting from the separate control of the skeleton and attributes. This enables image description generation that better accommodates user preferences.

meeting of the association for computational linguistics | 2016

Automatic Annotation of Structured Facts in Images

Mohamed Elhoseiny; Scott Cohen; Walter Chang; Brian L. Price; Ahmed M. Elgammal

Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions. Example structured facts include attributed objects (e.g., ), actions (e.g., ), interactions (e.g., ), and positional information (e.g., ). The collected annotations are in the form of fact-image pairs (e.g., and an image region containing this fact). With a language approach, the proposed method is able to collect hundreds of thousands of visual fact annotations with accuracy of 83% according to human judgment. Our method automatically collected more than 380,000 visual fact annotations and more than 110,000 unique visual facts from images with captions and localized them in images in less than one day of processing time on standard CPU platforms.

international conference on computer vision | 2015

Beyond White: Ground Truth Colors for Color Constancy Correction

Dongliang Cheng; Brian L. Price; Scott Cohen; Michael S. Brown

A limitation in color constancy research is the inability to establish ground truth colors for evaluating corrected images. Many existing datasets contain images of scenes with a color chart included, however, only the charts neutral colors (grayscale patches) are used to provide the ground truth for illumination estimation and correction. This is because the corrected neutral colors are known to lie along the achromatic line in the cameras color space (i.e. R=G=B), the correct RGB values of the other color patches are not known. As a result, most methods estimate a 3*3 diagonal matrix that ensures only the neutral colors are correct. In this paper, we describe how to overcome this limitation. Specifically, we show that under certain illuminations, a diagonal 3*3 matrix is capable of correcting not only neutral colors, but all the colors in a scene. This finding allows us to find the ground truth RGB values for the color chart in the cameras color space. We show how to use this information to correct all the images in existing datasets to have correct colors. Working from these new color corrected datasets, we describe how to modify existing color constancy algorithms to perform better image correction.

Explore More