Dmitry Kit
University of Bath
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dmitry Kit.
International Journal of Computer Vision | 2014
Ming Shao; Dmitry Kit; Yun Fu
It is expensive to obtain labeled real-world visual data for use in training of supervised algorithms. Therefore, it is valuable to leverage existing databases of labeled data. However, the data in the source databases is often obtained under conditions that differ from those in the new task. Transfer learning provides techniques for transferring learned knowledge from a source domain to a target domain by finding a mapping between them. In this paper, we discuss a method for projecting both source and target data to a generalized subspace where each target sample can be represented by some combination of source samples. By employing a low-rank constraint during this transfer, the structure of source and target domains are preserved. This approach has three benefits. First, good alignment between the domains is ensured through the use of only relevant data in some subspace of the source domain in reconstructing the data in the target domain. Second, the discriminative power of the source domain is naturally passed on to the target domain. Third, noisy information will be filtered out during knowledge transfer. Extensive experiments on synthetic data, and important computer vision problems such as face recognition application and visual domain adaptation for object recognition demonstrate the superiority of the proposed approach over the existing, well-established methods.
european conference on computer vision | 2014
Yu Kong; Dmitry Kit; Yun Fu
The speed with which intelligent systems can react to an action depends on how soon it can be recognized. The ability to recognize ongoing actions is critical in many applications, for example, spotting criminal activity. It is challenging, since decisions have to be made based on partial videos of temporally incomplete action executions. In this paper, we propose a novel discriminative multi-scale model for predicting the action class from a partially observed video. The proposed model captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller temporal segments. We develop a new learning formulation, which elegantly captures the temporal evolution over time, and enforces the label consistency between segments and corresponding partial videos. Experimental results on two public datasets show that the proposed approach outperforms state-of-the-art action prediction methods.
PLOS ONE | 2014
Dmitry Kit; Leor N. Katz; Brian Sullivan; Kat Snyder; Dana H. Ballard; Mary Hayhoe
Visual memory has been demonstrated to play a role in both visual search and attentional prioritization in natural scenes. However, it has been studied predominantly in experimental paradigms using multiple two-dimensional images. Natural experience, however, entails prolonged immersion in a limited number of three-dimensional environments. The goal of the present experiment was to recreate circumstances comparable to natural visual experience in order to evaluate the role of scene memory in guiding eye movements in a natural environment. Subjects performed a continuous visual-search task within an immersive virtual-reality environment over three days. We found that, similar to two-dimensional contexts, viewers rapidly learn the location of objects in the environment over time, and use spatial memory to guide search. Incidental fixations did not provide obvious benefit to subsequent search, suggesting that semantic contextual cues may often be just as efficient, or that many incidentally fixated items are not held in memory in the absence of a specific task. On the third day of the experience in the environment, previous search items changed in color. These items were fixated upon with increased probability relative to control objects, suggesting that memory-guided prioritization (or Surprise) may be a robust mechanisms for attracting gaze to novel features of natural environments, in addition to task factors and simple spatial saliency.
intelligent robots and systems | 2011
Dmitry Kit; Brian Sullivan; Dana H. Ballard
Detecting visual changes in environments is an important computation with many applications in robotics and computer vision. Security cameras, remotely operated vehicles, and sentry robots could all benefit from robust change detection capability. We conjecture that if one has a mobile camera system the number of visual scenes that are experienced is limited (compared to the space of all possible scenes) and that the scenes do not frequently undergo major changes between observations. These assumptions can be exploited to ease the task of change detection and reduce the computational complexity of processing visual information by utilizing memory to store previous computations.
Journal of Vision | 2016
Chia Ling Li; M. Pilar Aivar; Dmitry Kit; Matthew Tong; Mary Hayhoe
The role of memory in guiding attention allocation in daily behaviors is not well understood. In experiments with two-dimensional (2D) images, there is mixed evidence about the importance of memory. Because the stimulus context in laboratory experiments and daily behaviors differs extensively, we investigated the role of memory in visual search, in both two-dimensional (2D) and three-dimensional (3D) environments. A 3D immersive virtual apartment composed of two rooms was created, and a parallel 2D visual search experiment composed of snapshots from the 3D environment was developed. Eye movements were tracked in both experiments. Repeated searches for geometric objects were performed to assess the role of spatial memory. Subsequently, subjects searched for realistic context objects to test for incidental learning. Our results show that subjects learned the room-target associations in 3D but less so in 2D. Gaze was increasingly restricted to relevant regions of the room with experience in both settings. Search for local contextual objects, however, was not facilitated by early experience. Incidental fixations to context objects do not necessarily benefit search performance. Together, these results demonstrate that memory for global aspects of the environment guides search by restricting allocation of attention to likely regions, whereas task relevance determines what is learned from the active search experience. Behaviors in 2D and 3D environments are comparable, although there is greater use of memory in 3D.
Multisensory Research | 2013
Dana H. Ballard; Dmitry Kit; Constantin A. Rothkopf; Brian Sullivan
Cognition can appear complex owing to the fact that the brain is capable of an enormous repertoire of behaviors. However, this complexity can be greatly reduced when constraints of time and space are taken into account. The brain is constrained by the body to limit its goal-directed behaviors to just a few independent tasks over the scale of 1-2 min, and can pursue only a very small number of independent agendas. These limitations have been characterized from a number of different vantage points such as attention, working memory and dual task performance. It may be possible that the disparate perspectives of all these methodologies can be unified if behaviors can be seen as modular and hierarchically organized. From this vantage point, cognition can be seen as having a central problem of scheduling behaviors to achieve short-term goals. Thus dual-task paradigms can be seen as studying the concurrent management of simultaneous, competing agendas. Attention can be seen as focusing on the decision as to whether to interrupt the current agenda or persevere. Working memory can be seen as the bookkeeping necessary to manage the state of the current active agenda items.
Journal of Vision | 2013
Gabriel Diaz; Joseph L. Cooper; Dmitry Kit; Mary Hayhoe
Despite the growing popularity of virtual reality environments, few laboratories are equipped to investigate eye movements within these environments. This primer is intended to reduce the time and effort required to incorporate eye-tracking equipment into a virtual reality environment. We discuss issues related to the initial startup and provide algorithms necessary for basic analysis. Algorithms are provided for the calculation of gaze angle within a virtual world using a monocular eye-tracker in a three-dimensional environment. In addition, we provide algorithms for the calculation of the angular distance between the gaze and a relevant virtual object and for the identification of fixations, saccades, and pursuit eye movements. Finally, we provide tools that temporally synchronize gaze data and the visual stimulus and enable real-time assembly of a video-based record of the experiment using the Quicktime MOV format, available at http://sourceforge.net/p/utdvrlibraries/. This record contains the visual stimulus, the gaze cursor, and associated numerical data and can be used for data exportation, visual inspection, and validation of calculated gaze movements.
international symposium on neural networks | 2014
Dmitry Kit; Yu Kong; Yun Fu
Can a machine tell us if an image was taken in Beijing or New York? Automated identification of the geographical coordinates based on image content is of particular importance to data mining systems, because geolocation provides a large source of context for other useful features of an image. However, successful localization of unannotated images requires a large collection of images that cover all possible locations. Brute-force searches over the entire databases are costly in terms of computation and storage requirements, and achieve limited results. Knowing what visual features make a particular location unique or similar to other locations can be used for choosing a better match between spatially distance locations. However, doing this at global scales is a challenging problem. In this paper we propose an on-line, unsupervised, clustering algorithm called Location Aware Self-Organizing Map (LASOM), for learning the similarity graph between different regions. The goal of LASOM is to select key features in specific locations so as to increase the accuracy in geotagging untagged images, while also reducing computational and storage requirements. Different from other Self-Organizing Map algorithms, LASOM provides the means to learn a conditional distribution of visual features, conditioned on geospatial coordinates. We demonstrate that the generated map not only preserves important visual information, but provides additional context in the form of visual similarity relationships between different geographical areas. We show how this information can be used to improve geotagging results when using large databases.
human computer interaction with mobile devices and services | 2016
Dmitry Kit; Brian T. Sullivan
Naturalistic eye movement behavior has been measured in a variety of scenarios [15] and eye movement patterns appear indicative of task demands [16]. However, systematic task classification of eye movement data is a relatively recent development [1,3,7]. Additionally, prior work has focused on classification of eye movements while viewing 2D screen based imagery. In the current study, eye movements from eight participants were recorded with a mobile eye tracker. Participants performed five everyday tasks: Making a sandwich, transcribing a document, walking in an office and a city street, and playing catch with a flying disc [14]. Using only saccadic direction and amplitude time series data, we trained a hidden Markov model for each task and classified unlabeled data by calculating the probability that each model could generate the observed sequence. We present accuracy and time to recognize results, demonstrating better than chance performance.
graphics interface | 2016
Shridhar Ravikumar; Colin Davidson; Dmitry Kit; Neill D. F. Campbell; Luca Benedetti; Darren Cosker
Marker based performance capture is one of the most widely used approaches for facial tracking owing to its robustness. In practice, marker based systems do not capture the performance with complete fidelity and often require subsequent manual adjustment to incorporate missing visual details. This problem persists even when using larger number of markers. Tracking a large number of markers can also quickly become intractable due to issues such as occlusion, swapping and merging of markers. We present a new approach for fitting blendshape models to motion-capture data that improves quality, by exploiting information from sparse make-up patches in the video between the markers, while using fewer markers. Our method uses a classification based approach that detects FACS Action Units and their intensities to assist the solver in predicting optimal blendshape weights while taking perceptual quality into consideration. Our classifier is independent of the performer; once trained, it can be applied to multiple performers. Given performances captured using a Head Mounted Camera (HMC), which provides 3D facial marker based tracking and corresponding video, we fit accurate, production quality blendshape models to this data resulting in high-quality animations.