Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Victor Fragoso is active.

Publication


Featured researches published by Victor Fragoso.


workshop on applications of computer vision | 2011

TranslatAR: A mobile augmented reality translator

Victor Fragoso; Steffen Gauglitz; Shane Zamora; Jim Kleban; Matthew Turk

We present a mobile augmented reality (AR) translation system, using a smartphones camera and touchscreen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay. The translation seamlessly replaces the original text in the live camera stream, matching background and foreground colors estimated from the source images. For this purpose, we developed an efficient algorithm for accurately detecting the location and orientation of the text in a live camera stream that is robust to perspective distortion, and we combine it with OCR and a text-to-text translation engine. Our experimental results, using the ICDAR 2003 dataset and our own set of video sequences, quantify the accuracy of our detection and analyze the sources of failure among the systems components. With the OCR and translation running in a background thread, the system runs at 26 fps on a current generation smartphone (Nokia N900) and offers a particularly easy-to-use and simple method for translation, especially in situations in which typing or correct pronunciation (for systems with speech input) is cumbersome or impossible.


international conference on computer vision | 2013

EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory

Victor Fragoso; Pradeep Sen; Sergio Rodriguez; Matthew Turk

Algorithms based on RANSAC that estimate models using feature correspondences between images can slow down tremendously when the percentage of correct correspondences (inliers) is small. In this paper, we present a probabilistic parametric model that allows us to assign confidence values for each matching correspondence and therefore accelerates the generation of hypothesis models for RANSAC under these conditions. Our framework leverages Extreme Value Theory to accurately model the statistics of matching scores produced by a nearest-neighbor feature matcher. Using a new algorithm based on this model, we are able to estimate accurate hypotheses with RANSAC at low inlier ratios significantly faster than previous state-of-the-art approaches, while still performing comparably when the number of inliers is large. We present results of homography and fundamental matrix estimation experiments for both SIFT and SURF matches that demonstrate that our method leads to accurate and fast model estimations.


international conference on computer vision | 2011

Automatic text detection for mobile augmented reality translation

Marc Petter; Victor Fragoso; Matthew Turk; Charles Baur

We present a fast automatic text detection algorithm devised for a mobile augmented reality (AR) translation system on a mobile phone. In this application, scene text must be detected, recognized, and translated into a desired language, and then the translation is displayed overlaid properly on the real-world scene. In order to offer a fast automatic text detector, we focused our initial search to find a single letter. Detecting one letter provides useful information that is processed with efficient rules to quickly find the reminder of a word. This approach allows for detecting all the contiguous text regions in an image quickly. We also present a method that exploits the redundancy of the information contained in the video stream to remove false alarms. Our experimental results quantify the accuracy and efficiency of the algorithm and show the strengths and weaknesses of the method as well as its speed (about 160 ms on a recent generation smartphone, not optimized). The algorithm is well suited for real-time, real-world applications.


european conference on computer vision | 2014

gDLS: A Scalable Solution to the Generalized Pose and Scale Problem

Chris Sweeney; Victor Fragoso; Tobias Höllerer; Matthew Turk

In this work, we present a scalable least-squares solution for computing a seven degree-of-freedom similarity transform. Our method utilizes the generalized camera model to compute relative rotation, translation, and scale from four or more 2D-3D correspondences. In particular, structure and motion estimations from monocular cameras lack scale without specific calibration. As such, our methods have applications in loop closure in visual odometry and registering multiple structure from motion reconstructions where scale must be recovered. We formulate the generalized pose and scale problem as a minimization of a least squares cost function and solve this minimization without iterations or initialization. Additionally, we obtain all minima of the cost function. The order of the polynomial system that we solve is independent of the number of points, allowing our overall approach to scale favorably. We evaluate our method experimentally on synthetic and real datasets and demonstrate that our methods produce higher accuracy similarity transform solutions than existing methods.


computer vision and pattern recognition | 2013

SWIGS: A Swift Guided Sampling Method

Victor Fragoso; Matthew Turk

We present SWIGS, a Swift and efficient Guided Sampling method for robust model estimation from image feature correspondences. Our method leverages the accuracy of our new confidence measure (MR-Rayleigh), which assigns a correctness-confidence to a putative correspondence in an online fashion. MR-Rayleigh is inspired by Meta-Recognition (MR), an algorithm that aims to predict when a classifiers outcome is correct. We demonstrate that by using a Rayleigh distribution, the prediction accuracy of MR can be improved considerably. Our experiments show that MR-Rayleigh tends to predict better than the often-used Lowes ratio, Browns ratio, and the standard MR under a range of imaging conditions. Furthermore, our homography estimation experiment demonstrates that SWIGS performs similarly or better than other guided sampling methods while requiring fewer iterations, leading to fast and accurate model estimates.


workshop on applications of computer vision | 2016

Eye-CU: Sleep pose classification for healthcare using multimodal multiview data

Carlos Torres; Victor Fragoso; Scott D. Hammond; Jeffrey Fried; B. S. Manjunath

Manual analysis of body poses of bed-ridden patients requires staff to continuously track and record patient poses. Two limitations in the dissemination of pose-related therapies are scarce human resources and unreliable automated systems. This work addresses these issues by introducing a new method and a new system for robust automated classification of sleep poses in an Intensive Care Unit (ICU) environment. The new method, coupled-constrained Least-Squares (cc-LS), uses multimodal and multiview (MM) data and finds the set of modality trust values that minimizes the difference between expected and estimated labels. The new system, Eye-CU, is an affordable multi-sensor modular system for unobtrusive data collection and analysis in healthcare. Experimental results indicate that the performance of cc-LS matches the performance of existing methods in ideal scenarios. This method outperforms the latest techniques in challenging scenarios by 13% for those with poor illumination and by 70% for those with both poor illumination and occlusions. Results also show that a reduced Eye-CU configuration can classify poses without pressure information with only a slight drop in its performance.


international conference on 3d vision | 2016

Large Scale SfM with the Distributed Camera Model

Chris Sweeney; Victor Fragoso; Tobias Höllerer; Matthew Turk

We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and we describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS[21]. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM, however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.


Mobile Cloud Visual Media Computing | 2015

Computer Vision for Mobile Augmented Reality

Matthew Turk; Victor Fragoso

Mobile augmented reality (AR) employs computer vision capabilities in order to properly integrate the real and the virtual, whether that integration involves the user’s location, object-based interaction, 2D or 3D annotations, or precise alignment of image overlays. Real-time vision technologies vital for the AR context include tracking, object and scene recognition, localization, and scene model construction. For mobile AR, which has limited computational resources compared with static computing environments, efficient processing is critical, as are consideration of power consumption (i.e., battery life), processing and memory limitations, lag, and the processing and display requirements of the foreground application. On the other hand, additional sensors (such as gyroscopes, accelerometers, and magnetometers) are typically available in the mobile context, and, unlike many traditional computer vision applications, user interaction is often available for user feedback and disambiguation. In this chapter, we discuss the use of computer vision for mobile augmented reality and present work on a vision-based AR application (mobile sign detection and translation), a vision-supplied AR resource (indoor localization and post estimation), and a low-level correspondence tracking and model estimation approach to increase accuracy and efficiency of computer vision methods in augmented reality.


international conference on pattern recognition | 2016

One-class slab support vector machine

Victor Fragoso; Walter J. Scheirer; João P. Hespanha; Matthew Turk

This work introduces the one-class slab SVM (OCSSVM), a one-class classifier that aims at improving the performance of the one-class SVM. The proposed strategy reduces the false positive rate and increases the accuracy of detecting instances from novel classes. To this end, it uses two parallel hyperplanes to learn the normal region of the decision scores of the target class. OCSSVM extends one-class SVM since it can scale and learn non-linear decision functions via kernel methods. The experiments on two publicly available datasets show that OCSSVM can consistently outperform the one-class SVM and perform comparable to or better than other state-of-the-art one-class classifiers.


computer vision and pattern recognition | 2014

Cascade of Box (CABOX) Filters for Optimal Scale Space Approximation

Victor Fragoso; Gaurav Srivastava; Abhishek Nagar; Zhu Li; Kyung-Mo Park; Matthew Turk

Local image features, such as blobs and corners, have proven to be very useful for several computer vision applications. However, for enabling applications such as visual search and augmented reality with near-realtime latency, blob detection can be quite computationally expensive due to numerous convolution operations. In this paper, we present a sparse convex formulation to determine a minimal set of box filters for fast yet robust approximation to the Gaussian kernels used for blob detection. We call our feature detector as CABOX (CAscade of BOX) detector. Although box approximations to a filter have been studied in the literature, previous approaches suffer from one or more of the following problems: 1) ad hoc box filter design, 2) non-elegant trade-off between filter reconstruction quality and speed and, 3) limited experimental evaluation considering very small datasets. This paper, on the other hand, contributes: 1) an elegant optimization approach to determine an optimal sparse set of box filters, and 2) a comprehensive experimental evaluation including a large scale image matching experiment with about 16 K matching and 170 K non-matching image pairs. Our experimental results show a substantial overlap (89%) between the features detected with our proposed method and the popular Difference-of-Gaussian (DoG) approach. And yet CABOX is 44% faster. Moreover, the large scale experiment shows that CABOX closely reproduces DoGs performance in an end-to-end feature detection and matching pipeline.

Collaboration


Dive into the Victor Fragoso's collaboration.

Top Co-Authors

Avatar

Matthew Turk

University of California

View shared research outputs
Top Co-Authors

Avatar

Chris Sweeney

University of California

View shared research outputs
Top Co-Authors

Avatar

Deva Ramanan

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Pradeep Sen

University of California

View shared research outputs
Top Co-Authors

Avatar

Aayush Bansal

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carlos Torres

University of California

View shared research outputs
Top Co-Authors

Avatar

Jeffrey Fried

Santa Barbara Cottage Hospital

View shared research outputs
Researchain Logo
Decentralizing Knowledge