Till Kroeger
ETH Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Till Kroeger.
european conference on computer vision | 2016
Till Kroeger; Radu Timofte; Dengxin Dai; Luc Van Gool
Most recent works in optical flow extraction focus on the accuracy and neglect the time complexity. However, in real-life visual applications, such as tracking, activity detection and recognition, the time complexity is critical. We propose a solution with very low time complexity and competitive accuracy for the computation of dense optical flow. It consists of three parts: (1) inverse search for patch correspondences; (2) dense displacement field creation through patch aggregation along multiple scales; (3) variational refinement. At the core of our Dense Inverse Search-based method (DIS) is the efficient search of correspondences inspired by the inverse compositional image alignment proposed by Baker and Matthews (2001, 2004). DIS is competitive on standard optical flow benchmarks. DIS runs at 300 Hz up to 600 Hz on a single CPU core (1024 \(\times \) 436 resolution. 42 Hz/46 Hz when including preprocessing: disk access, image re-scaling, gradient computation. More details in Sect. 3.1.), reaching the temporal resolution of human’s biological vision system. It is order(s) of magnitude faster than state-of-the-art methods in the same range of accuracy, making DIS ideal for real-time applications.
european conference on computer vision | 2014
Till Kroeger; Luc Van Gool
Registering image data to Structure from Motion (SfM) point clouds is widely used to find precise camera location and orientation with respect to a world model. In case of videos one constraint has previously been unexploited: temporal smoothness. Without temporal smoothness the magnitude of the pose error in each frame of a video will often dominate the magnitude of frame-to-frame pose change. This hinders application of methods requiring stable poses estimates (e.g. tracking, augmented reality). We incorporate temporal constraints into the image-based registration setting and solve the problem by pose regularization with model fitting and smoothing methods. This leads to accurate, gap-free and smooth poses for all frames. We evaluate different methods on challenging synthetic and real street-view SfM data for varying scenarios of motion speed, outlier contamination, pose estimation failures and 2D-3D correspondence noise. For all test cases a 2 to 60-fold reduction in root mean squared (RMS) positional error is observed, depending on pose estimation difficulty. For varying scenarios, different methods perform best. We give guidance which methods should be preferred depending on circumstances and requirements.
computer vision and pattern recognition | 2015
Dengxin Dai; Till Kroeger; Radu Timofte; Luc Van Gool
Metric learning has proved very successful. However, human annotations are necessary. In this paper, we propose an unsupervised method, dubbed Metric Imitation (MI), where metrics over cheap features (target features, TFs) are learned by imitating the standard metrics over more sophisticated, off-the-shelf features (source features, SFs) by transferring view-independent property manifold structures. In particular, MI consists of: 1) quantifying the properties of source metrics as manifold geometry, 2) transferring the manifold from source domain to target domain, and 3) learning a mapping of TFs so that the manifold is approximated as well as possible in the mapped feature domain. MI is useful in at least two scenarios where: 1) TFs are more efficient computationally and in terms of memory than SFs; and 2) SFs contain privileged information, but are not available during testing. For the former, MI is evaluated on image clustering, category-based image retrieval, and instance-based object retrieval, with three SFs and three TFs. For the latter, MI is tested on the task of example-based image super-resolution, where high-resolution patches are taken as SFs and low-resolution patches as TFs. Experiments show that MI is able to provide good metrics while avoiding expensive data labeling efforts and that it achieves state-of-the-art performance for image super-resolution. In addition, manifold transfer is an interesting direction of transfer learning.
computer vision and pattern recognition | 2015
Till Kroeger; Dengxin Dai; Luc Van Gool
We present a novel vanishing point (VP) detection and tracking algorithm for calibrated monocular image sequences. Previous VP detection and tracking methods usually assume known camera poses for all frames or detect and track separately. We advance the state-of-the-art by combining VP extraction on a Gaussian sphere with recent advances in multi-target tracking on probabilistic occupancy fields. The solution is obtained by solving a Linear Program (LP). This enables the joint detection and tracking of multiple VPs over sequences. Unlike existing works we do not need known camera poses, and at the same time avoid detecting and tracking in separate steps. We also propose an extension to enforce VP orthogonality. We augment an existing video dataset consisting of 48 monocular videos with multiple annotated VPs in 14448 frames for evaluation. Although the method is designed for unknown camera poses, it is also helpful in scenarios with known poses, since a multi-frame approach in VP detection helps to regularize in frames with weak VP line support.
Lecture Notes in Computer Science | 2014
Till Kroeger; Ralf Dragon; Luc Van Gool
In this paper we present a novel variational model to jointly estimate geometry and motion from a sequence of light fields captured with a plenoptic camera. The proposed model uses the so-called subaperture representation of the light field. Sub-aperture images represent images with slightly different viewpoints, which can be extracted from the light field. The sub-aperture representation allows us to formulate a convex global energy functional, which enforces multi-view geometry consistency, and piecewise smoothness assumptions on the scene flow variables. We optimize the proposed scene flow model by using an efficient preconditioned primal-dual algorithm. Finally, we also present synthetic and real world experiments.
computer vision and pattern recognition | 2017
Pablo Speciale; Danda Pani Paudel; Martin R. Oswald; Till Kroeger; Luc Van Gool; Marc Pollefeys
Consensus maximization has proven to be a useful tool for robust estimation. While randomized methods like RANSAC are fast, they do not guarantee global optimality and fail to manage large amounts of outliers. On the other hand, global methods are commonly slow because they do not exploit the structure of the problem at hand. In this paper, we show that the solution space can be reduced by introducing Linear Matrix Inequality (LMI) constraints. This leads to significant speed ups of the optimization time even for large amounts of outliers, while maintaining global optimality. We study several cases in which the objective variables have a special structure, such as rotation, scaled-rotation, and essential matrices, which are posed as LMI constraints. This is very useful in several standard computer vision problems, such as estimating Similarity Transformations, Absolute Poses, and Relative Poses, for which we obtain compelling results on both synthetic and real datasets. With up to 90 percent outlier rate, where RANSAC often fails, our constrained approach is consistently faster than the non-constrained one - while finding the same global solution.
2015 IEEE Winter Applications and Computer Vision Workshops | 2015
Till Kroeger; Dengxin Dai; Radu Timofte; Luc Van Gool
While vanishing point (VP) estimation has received extensive attention, most approaches focus on static images or perform detection and tracking separately. In this paper, we focus on man-made environments and propose a novel method for detecting and tracking groups of mutually orthogonal vanishing points (MOVP), also known as Manhattan frames, jointly from monocular videos. The method is unique in that it is designed to enforce orthogonality in groups of VPs, temporal consistency of each individual MOVP, and orientation consistency of all putative MOVP. To this end, the method consists of three steps: 1) proposal of MOVP candidates by directly incorporating mutual orthogonality; 2) extracting consistent tracks of MOVPs by minimizing the flow cost over a network where nodes are putative MOVPs and edges are putative links across time; and 3) refinement of all MOVPs by enforcing consistency between lines, their identified vanishing directions and consistency of global camera orientation. The method is evaluated on six newly collected and annotated videos of urban scenes. Extensive experiments show that the method outperforms greedy MOVP tracking method considerably. In addition, we also test the method for camera orientation estimation and show that it obtains very promising results on a challenging street-view dataset.
german conference on pattern recognition | 2014
Till Kroeger; Ralf Dragon; Luc Van Gool
We propose a new tracking-by-detection algorithm for multiple targets from multiple dynamic, unlocalized and unconstrained cameras. In the past tracking has either been done with multiple static cameras, or single and stereo dynamic cameras. We register several moving cameras using a given 3D model from Structure from Motion (SfM), and initialize the tracking given the registration. The camera uncertainty estimate can be efficiently incorporated into a flow-network formulation for tracking. As this is a novel task in the tracking domain, we evaluate our method on a new challenging dataset for tracking with multiple moving cameras and show that our tracking method can effectively deal with independently moving cameras and camera registration noise.
international conference on computer graphics and interactive techniques | 2017
Kenneth Vanhoey; Carlos Eduardo Porto de Oliveira; Hayko Riemenschneider; András Bódis-Szomorú; Santiago Manen; Danda Pani Paudel; Michael Gygli; Nikolay Kobyshev; Till Kroeger; Dengxin Dai; Luc Van Gool
VarCity - the Video is a short documentary-style CGI movie explaining the main outcomes of the 5-year Computer Vision research project VarCity. Besides a coarse overview of the research, we present the challenges that were faced in its production, induced by two factors: i) usage of imperfect research data produced by automatic algorithms, and ii) human factors, like federating researchers and a CG artist around a similar goal many had a different conception of, while no one had a detailed overview of all the content. Successive achievement was driven by some ad-hoc technical developments but more importantly of detailed and abundant communication and agreement on common best practices.
arXiv: Computer Vision and Pattern Recognition | 2018
Dengxin Dai; Wen Li; Till Kroeger; Luc Van Gool