Piotr Bojanowski
French Institute for Research in Computer Science and Automation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Piotr Bojanowski.
international conference on computer vision | 2013
Piotr Bojanowski; Francis R. Bach; Ivan Laptev; Jean Ponce; Cordelia Schmid; Josef Sivic
We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in feature length movies Casablanca and American Beauty.
european conference on computer vision | 2014
Piotr Bojanowski; Rémi Lajugie; Francis R. Bach; Ivan Laptev; Jean Ponce; Cordelia Schmid; Josef Sivic
We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk” then “sit” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.
international conference on computer vision | 2015
Piotr Bojanowski; Rémi Lajugie; Edouard Grave; Francis R. Bach; Ivan Laptev; Jean Ponce; Cordelia Schmid
Suppose that we are given a set of videos, along with natural language descriptions in the form of multiple sentences (e.g., manual annotations, movie scripts, sport summaries etc.), and that these sentences appear in the same temporal order as their visual counterparts. We propose in this paper a method for aligning the two modalities, i.e., automatically providing a time (frame) stamp for every sentence. Given vectorial features for both video and text, this can be cast as a temporal assignment problem, with an implicit linear mapping between the two feature modalities. We formulate this problem as an integer quadratic program, and solve its continuous convex relaxation using an efficient conditional gradient algorithm. Several rounding procedures are proposed to construct the final integer solution. After demonstrating significant improvements over the state of the art on the related task of aligning video with symbolic labels [7], we evaluate our method on a challenging dataset of videos with associated textual descriptions [37], and explore bag-of-words and continuous representations for text.
computer vision and pattern recognition | 2016
Jean-Baptiste Alayrac; Piotr Bojanowski; Nishant Agrawal; Josef Sivic; Ivan Laptev; Simon Lacoste-Julien
We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks1 that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.
computer vision and pattern recognition | 2016
Guillaume Seguin; Piotr Bojanowski; Rémi Lajugie; Ivan Laptev
We address the problem of segmenting multiple object instances in complex videos. Our method does not require manual pixel-level annotation for training, and relies instead on readily-available object detectors or visual object tracking only. Given object bounding boxes at input, we cast video segmentation as a weakly-supervised learning problem. Our proposed objective combines (a) a discriminative clustering term for background segmentation, (b) a spectral clustering one for grouping pixels of same object instances, and (c) linear constraints enabling instance-level segmentation. We propose a convex relaxation of this problem and solve it efficiently using the Frank-Wolfe algorithm. We report results and compare our method to several baselines on a new video dataset for multi-instance person segmentation.
international conference on acoustics, speech, and signal processing | 2016
Rémi Lajugie; Piotr Bojanowski; Philippe Cuvillier; Sylvain Arlot; Francis R. Bach
In this paper, we consider a new discriminative approach to the problem of audio-to-score alignment. We consider two distinct informations provided by music scores: (i) an exact ordered list of musical events and (ii) an approximate prior information about relative duration of events. We extend the basic dynamic time warping algorithm to a convex problem that learns optimal classifiers for all events while jointly aligning files, using only weak supervision. We show that the relative duration between events can be easily used as a penalization of our cost function and allows us to drastically improve performances of our approach. We demonstrate the validity of our approach on a large and realistic dataset.
conference of the european chapter of the association for computational linguistics | 2017
Armand Joulin; Edouard Grave; Piotr Bojanowski; Tomas Mikolov
international conference on machine learning | 2017
Moustapha Cisse; Piotr Bojanowski; Edouard Grave; Yann N. Dauphin; Nicolas Usunier
language resources and evaluation | 2017
Tomas Mikolov; Edouard Grave; Piotr Bojanowski; Christian Puhrsch; Armand Joulin
arXiv: Learning | 2015
Piotr Bojanowski; Armand Joulin; Tomas Mikolov