Mohamed A. Elgharib | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohamed A. Elgharib is active.

Explore More

Publication

Featured researches published by Mohamed A. Elgharib.

international conference on computer graphics and interactive techniques | 2016

Painting style transfer for head portraits using convolutional neural networks

Ahmed Selim; Mohamed A. Elgharib; Linda Doyle

Head portraits are popular in traditional painting. Automating portrait painting is challenging as the human visual system is sensitive to the slightest irregularities in human faces. Applying generic painting techniques often deforms facial structures. On the other hand portrait painting techniques are mainly designed for the graphite style and/or are based on image analogies; an example painting as well as its original unpainted version are required. This limits their domain of applicability. We present a new technique for transferring the painting from a head portrait onto another. Unlike previous work our technique only requires the example painting and is not restricted to a specific style. We impose novel spatial constraints by locally transferring the color distributions of the example painting. This better captures the painting texture and maintains the integrity of facial structures. We generate a solution through Convolutional Neural Networks and we present an extension to video. Here motion is exploited in a way to reduce temporal inconsistencies and the shower-door effect. Our approach transfers the painting style while maintaining the input photograph identity. In addition it significantly reduces facial deformations over state of the art.

computer vision and pattern recognition | 2015

Video magnification in presence of large motions

Mohamed A. Elgharib; Mohamed Hefeeda; William T. Freeman

Video magnification reveals subtle variations that would be otherwise invisible to the naked eye. Current techniques require all motion in the video to be very small, which is unfortunately not always the case. Tiny yet meaningful motions are often combined with larger motions, such as the small vibrations of a gate as it rotates, or the microsaccades in a moving eye. We present a layer-based video magnification approach that can amplify small motions within large ones. An examined region/layer is temporally aligned and subtle variations are magnified. Matting is used to magnify only region of interest while maintaining integrity of nearby sites. Results show handling larger motions, larger amplification factors and significant reduction in artifacts over state of the art.

acm multimedia | 2015

Gradient-based 2D-to-3D Conversion for Soccer Videos

Kiana Calagari; Mohamed A. Elgharib; Piotr Didyk; Alexandre Kaspar; Wojciech Matusik; Mohamed Hefeeda

A wide spread adoption of 3D videos and technologies is hindered by the lack of high-quality 3D content. One promising solution to address this problem is to use automated 2D-to-3D conversion. However, current conversion methods, while general, produce low-quality results with artifacts that are not acceptable to many viewers. We address this problem by showing how to construct a high-quality, domain-specific conversion method for soccer videos. We propose a novel, data-driven method that generates stereoscopic frames by transferring depth information from similar frames in a database of 3D stereoscopic videos. Creating a database of 3D stereoscopic videos with accurate depth is, however, very difficult. One of the key findings in this paper is showing that computer generated content in current sports computer games can be used to generate high-quality 3D video reference database for 2D-to-3D conversion methods. Once we retrieve similar 3D video frames, our technique transfers depth gradients to the target frame while respecting object boundaries. It then computes depth maps from the gradients, and generates the output stereoscopic video. We implement our method and validate it by conducting user-studies that evaluate depth perception and visual comfort of the converted 3D videos. We show that our method produces high-quality 3D videos that are almost indistinguishable from videos shot by stereo cameras. In addition, our method significantly outperforms the current state-of-the-art method. For example, up to 20% improvement in the perceived depth is achieved by our method, which translates to improving the mean opinion score from Good to Excellent.

human factors in computing systems | 2018

Crowd-Guided Ensembles: How Can We Choreograph Crowd Workers for Video Segmentation?

Alexandre Kaspar; Genevieve Patterson; Changil Kim; Yagiz Aksoy; Wojciech Matusik; Mohamed A. Elgharib

In this work, we propose two ensemble methods leveraging a crowd workforce to improve video annotation, with a focus on video object segmentation. Their shared principle is that while individual candidate results may likely be insufficient, they often complement each other so that they can be combined into something better than any of the individual results---the very spirit of collaborative working. For one, we extend a standard polygon-drawing interface to allow workers to annotate negative space, and combine the work of multiple workers instead of relying on a single best one as commonly done in crowdsourced image segmentation. For the other, we present a method to combine multiple automatic propagation algorithms with the help of the crowd. Such combination requires an understanding of where the algorithms fail, which we gather using a novel coarse scribble video annotation task. We evaluate our ensemble methods, discuss our design choices for them, and make our web-based crowdsourcing tools and results publicly available.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Retrieval in Long-Surveillance Videos Using User-Described Motion and Object Attributes

Gregory D. Castanon; Mohamed A. Elgharib; Venkatesh Saligrama; Pierre-Marc Jodoin

We present a content-based retrieval method for long-surveillance videos in wide-area (airborne) and near-field [closed-circuit television (CCTV)] imagery. Our goal is to retrieve video segments, with a focus on detecting objects moving on routes, that match user-defined events of interest. The sheer size and remote locations where surveillance videos are acquired necessitates highly compressed representations that are also meaningful for supporting user-defined queries. To address these challenges, we archive long-surveillance video through lightweight processing based on low-level local spatiotemporal extraction of motion and object 2. These are then hashed into an inverted index using locality-sensitive hashing. This local approach allows for query flexibility and leads to significant gains in compression. Our second task is to extract partial matches to user-created queries and assemble them into full matches using dynamic programming (DP). DP assembles the indexed low-level features into a video segment that matches the query route by exploiting causality. We examine CCTV and airborne footage, whose low contrast makes motion extraction more difficult. We generate robust motion estimates for airborne data using a tracklets generation algorithm, while we use the Horn and Schunck approach to generate motion estimates for CCTV. Our approach handles long routes, low contrasts, and occlusion. We derive bounds on the rate of false positives and demonstrate the effectiveness of the approach for counting, motion pattern recognition, and abandoned object applications.

european conference on computer vision | 2018

A Dataset of Flash and Ambient Illumination Pairs from the Crowd

Yagiz Aksoy; Changil Kim; Petr Kellnhofer; Sylvain Paris; Mohamed A. Elgharib; Marc Pollefeys; Wojciech Matusik

Illumination is a critical element of photography and is essential for many computer vision tasks. Flash light is unique in the sense that it is a widely available tool for easily manipulating the scene illumination. We present a dataset of thousands of ambient and flash illumination pairs to enable studying flash photography and other applications that can benefit from having separate illuminations. Different than the typical use of crowdsourcing in generating computer vision datasets, we make use of the crowd to directly take the photographs that make up our dataset. As a result, our dataset covers a wide variety of scenes captured by many casual photographers. We detail the advantages and challenges of our approach to crowdsourcing as well as the computational effort to generate completely separate flash illuminations from the ambient light in an uncontrolled setup. We present a brief examination of illumination decomposition, a challenging and underconstrained problem in flash photography, to demonstrate the use of our dataset in a data-driven approach.

acm multimedia | 2017

Sports VR Content Generation from Regular Camera Feeds

Kiana Calagari; Mohamed A. Elgharib; Shervin Shirmohammadi; Mohamed Hefeeda

With the recent availability of commodity Virtual Reality (VR) products, immersive video content is receiving a significant interest. However, producing high-quality VR content often requires upgrading the entire production pipeline, which is costly and time-consuming. In this work, we propose using video feeds from regular broadcasting cameras to generate immersive content. We utilize the motion of the main camera to generate a wide-angle panorama. Using various techniques, we remove the parallax and align all video feeds. We then overlay parts from each video feed on the main panorama using Poisson blending. We examined our technique on various sports including basketball, ice hockey and volleyball. Subjective studies show that most participants rated their immersive experience when viewing our generated content between Good to Excellent. In addition, most participants rated their sense of presence to be similar to ground-truth content captured using a GoPro Omni 360 camera rig.

IEEE Transactions on Multimedia | 2017

Data Driven 2D-to-3D Video Conversion for Soccer

Kiana Calagari; Mohamed A. Elgharib; Piotr Didyk; Alexandre Kaspar; Wojciech Matusik; Mohamed Hefeeda

A wide adoption of 3-D videos is hindered by the lack of high-quality 3-D content. One promising solution to this problem is through data-driven 2-D-to-3-D video conversion. Such approaches are based on learning depth maps from a large dataset of 2-D+Depth images. However, current conversion methods, while general, produce low-quality results with artifacts that are not acceptable to many viewers. We propose a novel, data-driven method for 2-D-to-3-D video conversion. Our method transfers the depth gradients from a large database of 2-D+Depth images. Capturing 2-D+Depth databases, however, are complex and costly, especially for outdoor sports games. We address this problem by creating a synthetic database from computer games and showing that this synthetic database can effectively be used to convert real videos. We propose a spatio-temporal method to ensure the smoothness of the generated depth within individual frames and across successive frames. In addition, we present an object boundary detection method customized for 2-D-to-3-D conversion systems, which produces clear depth boundaries for players. We implement our method and validate it by conducting user studies that evaluate depth perception and visual comfort of the converted 3-D videos. We show that our method produces high-quality 3-D videos that are almost indistinguishable from videos shot by stereo cameras. In addition, our method significantly outperforms the current state-of-the-art methods. For example, up to 20% improvement in the perceived depth is achieved by our method, which translates to improving the mean opinion score from good to excellent.

arXiv: Computer Vision and Pattern Recognition | 2017