Cem Keskin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cem Keskin is active.

Explore More

Publication

Featured researches published by Cem Keskin.

international conference on computer vision | 2011

Real time hand pose estimation using depth sensors

Cem Keskin; Furkan Kıraç; Yunus Emre Kara; Lale Akarun

This paper describes a depth image based real-time skeleton fitting algorithm for the hand, using an object recognition by parts approach, and the use of this hand modeler in an American Sign Language (ASL) digit recognition application. In particular, we created a realistic 3D hand model that represents the hand with 21 different parts. Random decision forests (RDF) are trained on synthetic depth images generated by animating the hand model, which are then used to perform per pixel classification and assign each pixel to a hand part. The classification results are fed into a local mode finding algorithm to estimate the joint locations for the hand skeleton. The system can process depth images retrieved from Kinect in real-time at 30 fps. As an application of the system, we also describe a support vector machine (SVM) based recognition module for the ten digits of ASL based on our method, which attains a recognition rate of 99.9% on live depth images in real-time1.

european conference on computer vision | 2012

Hand pose estimation and hand shape classification using multi-layered randomized decision forests

Cem Keskin; Furkan K; ra; Yunus Emre Kara; Lale Akarun

Vision based articulated hand pose estimation and hand shape classification are challenging problems. This paper proposes novel algorithms to perform these tasks using depth sensors. In particular, we introduce a novel randomized decision forest (RDF) based hand shape classifier, and use it in a novel multi---layered RDF framework for articulated hand pose estimation. This classifier assigns the input depth pixels to hand shape classes, and directs them to the corresponding hand pose estimators trained specifically for that hand shape. We introduce two novel types of multi---layered RDFs: Global Expert Network (GEN) and Local Expert Network (LEN), which achieve significantly better hand pose estimates than a single---layered skeleton estimator and generalize better to previously unseen hand poses. The novel hand shape classifier is also shown to be accurate and fast. The methods run in real---time on the CPU, and can be ported to the GPU for further increase in speed.

human factors in computing systems | 2015

Accurate, Robust, and Flexible Real-time Hand Tracking

Toby Sharp; Cem Keskin; Jonathan Taylor; Jamie Shotton; David Kim; Christoph Rhemann; Ido Leichter; Alon Vinnikov; Yichen Wei; Daniel Freedman; Pushmeet Kohli; Eyal Krupka; Andrew W. Fitzgibbon; Shahram Izadi

We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches which have focused on front-facing close-range scenarios. This flexibility opens up new possibilities for human-computer interaction with examples including tracking at distances from tens of centimeters through to several meters (for controlling the TV at a distance), supporting tracking using a moving depth camera (for mobile scenarios), and arbitrary camera placements (for VR headsets). These features are achieved through a new pipeline that combines a multi-layered discriminative reinitialization strategy for per-frame pose estimation, followed by a generative model-fitting stage. We provide extensive technical details and a detailed qualitative and quantitative analysis.

user interface software and technology | 2014

In-air gestures around unmodified mobile devices

Jie Song; Gábor Sörös; Fabrizio Pece; Sean Ryan Fanello; Shahram Izadi; Cem Keskin; Otmar Hilliges

We present a novel machine learning based algorithm extending the interaction space around mobile devices. The technique uses only the RGB camera now commonplace on off-the-shelf mobile devices. Our algorithm robustly recognizes a wide range of in-air gestures, supporting user variation, and varying lighting conditions. We demonstrate that our algorithm runs in real-time on unmodified mobile devices, including resource-constrained smartphones and smartwatches. Our goal is not to replace the touchscreen as primary input device, but rather to augment and enrich the existing interaction vocabulary using gestures. While touch input works well for many scenarios, we demonstrate numerous interaction tasks such as mode switches, application and task management, menu selection and certain types of navigation, where such input can be either complemented or better served by in-air gestures. This removes screen real-estate issues on small touchscreens, and allows input to be expanded to the 3D space around the device. We present results for recognition accuracy (93% test and 98% train), impact of memory footprint and other model parameters. Finally, we report results from preliminary user evaluations, discuss advantages and limitations and conclude with directions for future work.

international conference on computer graphics and interactive techniques | 2016

Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

Jonathan Taylor; Lucas Bordeaux; Thomas J. Cashman; Bob Corish; Cem Keskin; Toby Sharp; Eduardo Soto; David Sweeney; Julien P. C. Valentin; Benjamin Luff; Arran Haig Topalian; Erroll Wood; Sameh Khamis; Pushmeet Kohli; Shahram Izadi; Richard Banks; Andrew W. Fitzgibbon; Jamie Shotton

Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Todays dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

user interface software and technology | 2016

Holoportation: Virtual 3D Teleportation in Real-time

Sergio Orts-Escolano; Christoph Rhemann; Sean Ryan Fanello; Wayne Chang; Adarsh Prakash Murthy Kowdle; Yury Degtyarev; David Kim; Philip Lindsley Davidson; Sameh Khamis; Mingsong Dou; Vladimir Tankovich; Charles T. Loop; Qin Cai; Philip A. Chou; Sarah Mennicken; Julien P. C. Valentin; Vivek Pradeep; Shenlong Wang; Sing Bing Kang; Pushmeet Kohli; Yuliya Lutchyn; Cem Keskin; Shahram Izadi

We present an end-to-end system for augmented and virtual reality telepresence, called Holoportation. Our system demonstrates high-quality, real-time 3D reconstructions of an entire space, including people, furniture and objects, using a set of new depth cameras. These 3D models can also be transmitted in real-time to remote users. This allows users wearing virtual or augmented reality displays to see, hear and interact with remote participants in 3D, almost as if they were present in the same physical space. From an audio-visual perspective, communicating and interacting with remote users edges closer to face-to-face communication. This paper describes the Holoportation technical system in full, its key interactive capabilities, the application scenarios it enables, and an initial qualitative study of using this new communication medium.

computer vision and pattern recognition | 2014

User-Specific Hand Modeling from Monocular Depth Sequences

Jonathan Taylor; Richard V. Stebbing; Varun Ramakrishna; Cem Keskin; Jamie Shotton; Shahram Izadi; Aaron Hertzmann; Andrew W. Fitzgibbon

This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand models from as few as 15 frames of depth data. We propose an objective that measures the error of fit between each sampled data point and a continuous model surface defined by a rigged control mesh, and uses as-rigid-as-possible (ARAP) regularizers to cleanly separate the model and template geometries. A key contribution is our use of a smooth model based on subdivision surfaces that allows simultaneous optimization over both correspondences and model parameters. This avoids the use of iterated closest point (ICP) algorithms which often lead to slow convergence. Automatic initialization is obtained using a regression forest trained to infer approximate correspondences. Experiments show that the resulting meshes model the users hand shape more accurately than just adapting the shape parameters of the skeleton, and that the retargeted skeleton accurately models the users articulations. We investigate the effect of various modeling choices, and show the benefits of using subdivision surfaces and ARAP regularization.

international conference on computer graphics and interactive techniques | 2014

Learning to be a depth camera for close-range human capture and interaction

Sean Ryan Fanello; Cem Keskin; Shahram Izadi; Pushmeet Kohli; David Kim; David Sweeney; Antonio Criminisi; Jamie Shotton; Sing Bing Kang; Tim Paek

We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human-computer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.

human factors in computing systems | 2014

Type-hover-swipe in 96 bytes: a motion sensing mechanical keyboard

Stuart Taylor; Cem Keskin; Otmar Hilliges; Shahram Izadi; John Helmes

We present a new type of augmented mechanical keyboard, capable of sensing rich and expressive motion gestures performed both on and directly above the device. Our hardware comprises of low-resolution matrix of infrared (IR) proximity sensors interspersed between the keys of a regular mechanical keyboard. This results in coarse but high frame-rate motion data. We extend a machine learning algorithm, traditionally used for static classification only, to robustly support dynamic, temporal gestures. We propose the use of motion signatures a technique that utilizes pairs of motion history images and a random forest based classifier to robustly recognize a large set of motion gestures on and directly above the keyboard. Our technique achieves a mean per-frame classification accuracy of 75.6% in leave-one-subject-out and 89.9% in half-test/half-training cross-validation. We detail our hardware and gesture recognition algorithm, provide performance and accuracy numbers, and demonstrate a large set of gestures designed to be performed with our device. We conclude with qualitative feedback from users, discussion of limitations and areas for future work.

computer vision and pattern recognition | 2012

Randomized decision forests for static and dynamic hand shape classification

Cem Keskin; Furkan Kıraç; Yunus Emre Kara; L. Akarun

This paper proposes a novel algorithm to perform hand shape classification using depth sensors, without relying on color or temporal information. Hence, the system is independent of lighting conditions and does not need a hand registration step. The proposed method uses randomized classification forests (RDF) to assign class labels to each pixel on a depth image, and the final class label is determined by voting. This method is shown to achieve 97.8% success rate on an American Sign Language (ASL) dataset consisting of 65k images collected from five subjects with a depth sensor. More experiments are conducted on a subset of the ChaLearn Gesture Dataset, consisting of a lexicon with static and dynamic hand shapes. The hands are found using motion cues and cropped using depth information, with a precision rate of 87.88% when there are multiple gestures, and 94.35% when there is a single gesture in the sample. The hand shape classification success rate is 94.74% on a small subset of nine gestures corresponding to a single lexicon. The success rate is 74.3% for the leave-one-subject-out scheme, and 67.14% when training is conducted on an external dataset consisting of the same gestures. The method runs on the CPU in real-time, and is capable of running on the GPU for further increase in speed.

Explore More