Joan Bruna | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joan Bruna is active.

Explore More

Publication

Featured researches published by Joan Bruna.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Invariant Scattering Convolution Networks

Joan Bruna; Stéphane Mallat

A wavelet scattering network computes a translation invariant image representation which is stable to deformations and preserves high-frequency information for classification. It cascades wavelet transform convolutions with nonlinear modulus and averaging operators. The first network layer outputs SIFT-type descriptors, whereas the next layers provide complementary invariant information that improves classification. The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification. A scattering representation of stationary processes incorporates higher order moments and can thus discriminate textures having the same Fourier power spectrum. State-of-the-art classification results are obtained for handwritten digits and texture discrimination, with a Gaussian kernel SVM and a generative PCA classifier.

IEEE Signal Processing Magazine | 2017

Geometric Deep Learning: Going beyond Euclidean data

Michael M. Bronstein; Joan Bruna; Yann LeCun; Arthur Szlam; Pierre Vandergheynst

Many scientific fields study data with an underlying structure that is non-Euclidean. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions) and are natural targets for machine-learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural-language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure and in cases where the invariances of these structures are built into networks used to model them.

computer vision and pattern recognition | 2011

Classification with scattering operators

Joan Bruna; Stéphane Mallat

A scattering vector is a local descriptor including multiscale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex modulus. This scattering representation is locally translation invariant and linearizes deformations. A supervised classification algorithm is computed with a PCA model selection on scattering vectors. State of the art results are obtained for handwritten digit recognition and texture classification.

Neural Computation | 2016

A mathematical motivation for complex-valued convolutional networks

Mark Tygert; Joan Bruna; Soumith Chintala; Yann LeCun; Serkan Piantino; Arthur Szlam

A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors, followed by (2) taking the absolute value of every entry of the resulting vectors, followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as data-driven multiscale windowed power spectra, data-driven multiscale windowed absolute spectra, data-driven multiwavelet absolute values, or (in their most general configuration) data-driven nonlinear multiwavelet packets. Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (e.g., logistic or tanh) nonlinearities, or max pooling, for example, do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.

international conference on acoustics, speech, and signal processing | 2015

Source separation with scattering Non-Negative Matrix Factorization

Joan Bruna; Pablo Sprechmann; Yann LeCun

This paper presents a single-channel source separation method that extends the ideas of Nonnegative Matrix Factorization (NMF). We interpret the approach of audio demixing via NMF as a cascade of a pooled analysis operator, given for example by the magnitude spectrogram, and a synthesis operators given by the matrix decomposition. Instead of imposing the temporal consistency of the decomposition through sophisticated structured penalties in the synthesis stage, we propose to change the analysis operator for a deep scattering representation, where signals are represented at several time resolutions. This new signal representation is invariant to smooth changes in the signal, consistent with its temporal dynamics. We evaluate the proposed approach in a speech separation task obtaining promising results.

arXiv: Computer Vision and Pattern Recognition | 2011

Classification with invariant scattering representations

Joan Bruna; Stéphane Mallat

A scattering transform defines a signal representation which is invariant to translations and Lipschitz continuous relatively to deformations. It is implemented with a non-linear convolution network that iterates over wavelet and modulus operators. Lipschitz continuity locally linearizes deformations. Complex classes of signals and textures can be modeled with low-dimensional affine spaces, computed with a PCA in the scattering domain. Classification is performed with a penalized model selection. State of the art results are obtained for handwritten digit recognition over small training sets, and for texture classification. 1

international conference on latent variable analysis and signal separation | 2015

Audio Source Separation with Discriminative Scattering Networks

Pablo Sprechmann; Joan Bruna; Yann LeCun

Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms CQT with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations NMF that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.

international conference on consumer electronics | 2008

Bandlets for HDTV Video Processing

Stéphane Mallat; Joan Bruna

Summary form only given. De-interlacing, scaling, frame rate conversion, compression artifact removals are currently major video processing challenges to produce high quality images for HD TV displays. Recovering high resolution images from degraded or lower resolution videos requires adaptive geometric filtering and interpolation in space and time, at very high data rate. Finding appropriate image representations for geometric image processing is currently a fundamental research problem. We first review some psychophysic and physiologic models of human geometric visual processing. We then introduce bandlet bases, which are multi-scale geometric wavelets, constructed along space-time trajectories adapted to the video content. Applications to HDTV video processing will be shown.

international conference on learning representations | 2014