Siyu Tang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Siyu Tang is active.

Explore More

Publication

Featured researches published by Siyu Tang.

computer vision and pattern recognition | 2016

DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation

Leonid Pishchulin; Eldar Insafutdinov; Siyu Tang; Bjoern Andres; Mykhaylo Andriluka; Peter V. Gehler; Bernt Schiele

This paper considers the task of articulated human pose estimation of multiple people in real world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other. This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.

International Journal of Computer Vision | 2014

Detection and Tracking of Occluded People

Siyu Tang; Mykhaylo Andriluka; Bernt Schiele

We consider the problem of detection and tracking of multiple people in crowded street scenes. State-of-the-art methods perform well in scenes with relatively few people, but are severely challenged by scenes with many subjects that partially occlude each other. This limitation is due to the fact that current people detectors fail when persons are strongly occluded. We observe that typical occlusions are due to overlaps between people and propose a people detector tailored to various occlusion levels. Instead of treating partial occlusions as distractions, we leverage the fact that person/person occlusions result in very characteristic appearance patterns that can help to improve detection results. We demonstrate the performance of our occlusion-aware person detector on a new dataset of people with controlled but severe levels of occlusion and on two challenging publicly available benchmarks outperforming single person detectors in each case.

international conference on computer vision | 2013

Learning People Detectors for Tracking in Crowded Scenes

Siyu Tang; Mykhaylo Andriluka; Anton Milan; Konrad Schindler; Stefan Roth; Bernt Schiele

People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both detection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.

computer vision and pattern recognition | 2015

Subgraph decomposition for multi-target tracking

Siyu Tang; Bjoern Andres; Miykhaylo Andriluka; Bernt Schiele

Tracking multiple targets in a video, based on a finite set of detection hypotheses, is a persistent problem in computer vision. A common strategy for tracking is to first select hypotheses spatially and then to link these over time while maintaining disjoint path constraints [14, 15, 24]. In crowded scenes multiple hypotheses will often be similar to each other making selection of optimal links an unnecessary hard optimization problem due to the sequential treatment of space and time. Embracing this observation, we propose to link and cluster plausible detections jointly across space and time. Specifically, we state multi-target tracking as a Minimum Cost Subgraph Multicut Problem. Evidence about pairs of detection hypotheses is incorporated whether the detections are in the same frame, neighboring frames or distant frames. This facilitates long-range re-identification and within-frame clustering. Results for published benchmark sequences demonstrate the superiority of this approach.

computer vision and pattern recognition | 2017

Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications

Evgeny Levinkov; Jonas Uhrig; Siyu Tang; Mohamed Omran; Eldar Insafutdinov; Alexander Kirillov; Carsten Rother; Thomas Brox; Bernt Schiele; Bjoern Andres

We state a combinatorial optimization problem whose feasible solutions define both a decomposition and a node labeling of a given graph. This problem offers a common mathematical abstraction of seemingly unrelated computer vision tasks, including instance-separating semantic segmentation, articulated human body pose estimation and multiple object tracking. Conceptually, it generalizes the unconstrained integer quadratic program and the minimum cost lifted multicut problem, both of which are NP-hard. In order to find feasible solutions efficiently, we define two local search algorithms that converge monotonously to a local optimum, offering a feasible solution at any time. To demonstrate the effectiveness of these algorithms in tackling computer vision tasks, we apply them to instances of the problem that we construct from published data, using published algorithms. We report state-of-the-art application-specific accuracy in the three above-mentioned applications.

computer vision and pattern recognition | 2017

ArtTrack: Articulated Multi-Person Tracking in the Wild

Eldar Insafutdinov; Mykhaylo Andriluka; Leonid Pishchulin; Siyu Tang; Evgeny Levinkov; Bjoern Andres; Bernt Schiele

In this paper we propose an approach for articulated tracking of multiple people in unconstrained videos. Our starting point is a model that resembles existing architectures for single-frame pose estimation but is substantially faster. We achieve this in two ways: (1) by simplifying and sparsifying the body-part relationship graph and leveraging recent methods for faster inference, and (2) by offloading a substantial share of computation onto a feed-forward convolutional architecture that is able to detect and associate body joints of the same person even in clutter. We use this model to generate proposals for body joint locations and formulate articulated tracking as spatio-temporal grouping of such proposals. This allows to jointly solve the association problem for all people in the scene by propagating evidence from strong detections through time and enforcing constraints that each proposal can be assigned to one person only. We report results on a public MPII Human Pose benchmark and on a new MPII Video Pose dataset of image sequences with multiple people. We demonstrate that our model achieves state-of-the-art results while using only a fraction of time and is able to leverage temporal information to improve state-of-the-art for crowded scenes.

european conference on computer vision | 2016

Multi-Person Tracking by Multicut and Deep Matching

Siyu Tang; Bjoern Andres; Mykhaylo Andriluka; Bernt Schiele

In Tang et al. (2015), we proposed a graph-based formulation that links and clusters person hypotheses over time by solving a minimum cost subgraph multicut problem. In this paper, we modify and extend Tang et al. (2015) in three ways: (1) We introduce a novel local pairwise feature based on local appearance matching that is robust to partial occlusion and camera motion. (2) We perform extensive experiments to compare different pairwise potentials and to analyze the robustness of the tracking formulation. (3) We consider a plain multicut problem and remove outlying clusters from its solution. This allows us to employ an efficient primal feasible optimization algorithm that is not applicable to the subgraph multicut problem of Tang et al. (2015). Unlike the branch-and-cut algorithm used there, this efficient algorithm used here is applicable to long videos and many detections. Together with the novel pairwise feature, it eliminates the need for the intermediate tracklet representation of Tang et al. (2015). We demonstrate the effectiveness of our overall approach on the MOT16 benchmark (Milan et al. 2016), achieving state-of-art performance.

computer vision and pattern recognition | 2017

Multiple People Tracking by Lifted Multicut and Person Re-identification

Siyu Tang; Mykhaylo Andriluka; Bjoern Andres; Bernt Schiele

Tracking multiple persons in a monocular video of a crowded scene is a challenging task. Humans can master it even if they loose track of a person locally by re-identifying the same person based on their appearance. Care must be taken across long distances, as similar-looking persons need not be identical. In this work, we propose a novel graph-based formulation that links and clusters person hypotheses over time by solving an instance of a minimum cost lifted multicut problem. Our model generalizes previous works by introducing a mechanism for adding long-range attractive connections between nodes in the graph without modifying the original set of feasible solutions. This allows us to reward tracks that assign detections of similar appearance to the same person in a way that does not introduce implausible solutions. To effectively match hypotheses over longer temporal gaps we develop new deep architectures for re-identification of people. They combine holistic representations extracted with deep networks and body pose layout obtained with a state-of-the-art pose estimation model. We demonstrate the effectiveness of our formulation by reporting a new state-of-the-art for the MOT16 benchmark. The code and pre-trained models are publicly available.

computer vision and pattern recognition | 2017

Generating Descriptions with Grounded and Co-referenced People

Anna Rohrbach; Marcus Rohrbach; Siyu Tang; Seong Joon Oh; Bernt Schiele

Learning how to generate descriptions of images or videos received major interest both in the Computer Vision and Natural Language Processing communities. While a few works have proposed to learn a grounding during the generation process in an unsupervised way (via an attention mechanism), it remains unclear how good the quality of the grounding is and whether it benefits the description quality. In this work we propose a movie description model which learns to generate description and jointly ground (localize) the mentioned characters as well as do visual co-reference resolution between pairs of consecutive sentences/clips. We also propose to use weak localization supervision through character mentions provided in movie descriptions to learn the character grounding. At training time, we first learn how to localize characters by relating their visual appearance to mentions in the descriptions via a semi-supervised approach. We then provide this (noisy) supervision into our description model which greatly improves its performance. Our proposed description model improves over prior work w.r.t. generated description quality and additionally provides grounding and local co-reference resolution. We evaluate it on the MPII Movie Description dataset using automatic and human evaluation measures and using our newly collected grounding and co-reference data for characters.

international conference on information technology: new generations | 2011

OpenBioSafetyLab: A Virtual World Based Biosafety Training Application for Medical Students

Arturo Nakasone; Siyu Tang; Mika Shigematsu; Berthold Heinecke; Shuji Fujimoto; Helmut Prendinger

Recently, virtual world technology has been successfully used to create interesting and useful applications in the context of scientific research, ranging from visualization of static 3D representations of molecular entities to full execution of astrophysical simulations. Unfortunately, the majority of these applications are somewhat limited by the current functionality of the virtual world client software, constraining them to provide only the most basic features of user interfaces. Thus, the implementation of applications such as training programs becomes quite challenging due to the high level of interaction a user must have with the training environment itself. In order to analyze the level of usability for virtual world based applications, we will introduce Open Bio Safety Lab, our implementation of a virtual world based training application in bio-risk management, which is one of the most important areas for training systems nowadays. We also present the results of our exploratory test study with twenty-four subjects, which indicates a high degree of usability of our system, not only for the testing aspect of the training, but also for the learning aspect.

Explore More