Belhassen Bayar
Drexel University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Belhassen Bayar.
information hiding | 2016
Belhassen Bayar; Matthew C. Stamm
When creating a forgery, a forger can modify an image using many different image editing operations. Since a forensic examiner must test for each of these, significant interest has arisen in the development of universal forensic algorithms capable of detecting many different image editing operations and manipulations. In this paper, we propose a universal forensic approach to performing manipulation detection using deep learning. Specifically, we propose a new convolutional network architecture capable of automatically learning manipulation detection features directly from training data. In their current form, convolutional neural networks will learn features that capture an images content as opposed to manipulation detection features. To overcome this issue, we develop a new form of convolutional layer that is specifically designed to suppress an images content and adaptively learn manipulation detection features. Through a series of experiments, we demonstrate that our proposed approach can automatically learn how to detect multiple image manipulations without relying on pre-selected features or any preprocessing. The results of these experiments show that our proposed approach can automatically detect several different manipulations with an average accuracy of 99.10%.
Journal of Bioinformatics and Computational Biology | 2014
Belhassen Bayar; Nidhal Bouaynaya; Roman Shterenberg
Non-negative matrix factorization (NMF) has proven to be a useful decomposition technique for multivariate data, where the non-negativity constraint is necessary to have a meaningful physical interpretation. NMF reduces the dimensionality of non-negative data by decomposing it into two smaller non-negative factors with physical interpretation for class discovery. The NMF algorithm, however, assumes a deterministic framework. In particular, the effect of the data noise on the stability of the factorization and the convergence of the algorithm are unknown. Collected data, on the other hand, is stochastic in nature due to measurement noise and sometimes inherent variability in the physical process. This paper presents new theoretical and applied developments to the problem of non-negative matrix factorization (NMF). First, we generalize the deterministic NMF algorithm to include a general class of update rules that converges towards an optimal non-negative factorization. Second, we extend the NMF framework to the probabilistic case (PNMF). We show that the Maximum a posteriori (MAP) estimate of the non-negative factors is the solution to a weighted regularized non-negative matrix factorization problem. We subsequently derive update rules that converge towards an optimal solution. Third, we apply the PNMF to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy.
international conference on acoustics, speech, and signal processing | 2017
Belhassen Bayar; Matthew C. Stamm
Detecting image resampling in re-compressed images is a very challenging problem. Existing approaches to image resampling detection operate by building pre-selected model to locate periodicities in linear predictor residues. Additionally, if an image was JPEG compressed before resampling, existing techniques detect tampering using the artifacts left by the pre-compression. However, state-of-the-art approaches cannot detect resampling in re-compressed images initially compressed with high quality factor. In this paper, we propose a novel deep learning approach to adaptively learn resampling detection features directly from data. To accomplish this, we use our recently proposed constrained convolutional layer. Through a set of experiments we evaluate the effectiveness of our proposed constrained convolutional neural network (CNN) to detect resampling in re-compressed images. The results of these experiments show that our constrained CNN can accurately detect resampling in re-compressed images in scenarios that previous approaches are unable to detect.
electronic imaging | 2017
Belhassen Bayar; Matthew C. Stamm
Convolutional neural networks (CNNs) have received significant attention due to their ability to adaptively learn classification features directly from data. While CNNs have helped cause dramatic advances in fields such as object and speech recognition, multimedia forensics is fundamentally different problem compared to other deep learning applications. Little work exists to guide the design of CNN architectures for forensic tasks. Furthermore, it is still unclear which forensic tasks can be performed using CNNs. In this work, we investigate the design of CNNs for multiple multimedia forensic applications. We show that CNNs are capable of performing image manipulation detection as well as camera model identification. Through a series of experiments, we systematically examine the influence of several important CNN design choices for forensic applications, such as the use of a constrained convolutional layer or fixed high-pass filter at the beginning of the CNN, the use of nonlinearity after the first layer, the choice of activation and pooling functions, etc. We show that different CNN design choices should be made for different forensic applications and identify design choices to maximize the performance of CNNs for manipulation detection and camera model identification. Introduction Multimedia information, such as digital images, is frequently used in numerous important settings, such as evidence in legal proceedings, criminal investigations, and military and defense scenarios. Since this information can easily be edited or falsified, information forensics researchers have developed a wide variety of methods to forensically determine the authenticity and source of multimedia data [27]. Many early forensic approaches were developed by theoretically or heuristically identifying a set of traces left in an image by a particular processing operation or source device. For example, techniques have been developed to detect specific traces left by resampling and resizing [22, 14], contrast enhancement [26], median filtering [15, 13], sharpening [2], and many other operations. Similarly, forensic algorithms have been developed to identify the model of an image’s source camera using specific traces left by different elements of the camera’s internal processing pipeline [28, 3, 8]. More recent data-driven forensic approaches have leveraged techniques from steganalysis research that capture local pixel dependencies using high dimensional feature sets [9, 20]. These approaches have been used to detect image editing [23] and perform camera model identification [4, 19]. While these techniques have shown significant improvements in manipulation detection or camera model identificaiton accuracy, researchers are still left with questions such as: Are these the best set of classification features for forensic tasks? Can forensic traces and classification features be learned directly from data? Recently, researchers have shown significant interest in using techniques from deep learning to address problems in multimedia security such as image manipulation detection and steganalysis [1, 21]. Tools such as convolutional neural networks (CNNs) [17] show particular promise due to their ability to adaptively learn decision features from large sets of data. While CNNs have been successfully applied to computer vision problems such as object recognition [16, 29], lessons learned from this field do not necessarily translate to multimedia forensics. Because multimedia forensics is a fundamentally different problem, certain design principals that are successful when building CNNs to perform object recognition are suboptimal for multimedia forensic tasks. In their standard form, CNNs tend to learn features related to the image’s content, whereas, in image forensics tasks we need to suppress an image’s content and capture pixel value dependencies finduced by an editing operation (e.g., image tampering detection task [23]) or the camera’s image processing pipeline (e.g., camera model identification task using models of camera’s demosaicing algorithm [28]). If CNNs are used in their existing form, this will lead to a classifier that identifies the objects and scenes associated with the camera as opposed to learn image forensics classification features. In response to this problem, two methods have emerged namely the predetermined high-pass filter based deep learning approach used in steganography [21] and the ‘constrained convolutional’ layer adaptive approach used to perform image manipulation detection [1]. Since the use of deep learning approaches for multimedia security applications is still in its infancy, little work exists to guide the design of CNN architectures for forensic tasks. In addition, no work has explicitly examined and compared the performance of different proposed network topologies. As a result, several open questions currently exist with regard to the design and training of CNNs for multimedia forensics. For example: Which problems in multimedia forensics can CNNs be successfully applied to? Multiple approaches have recently been proposed for the design of the initial CNN layer (i.e. high-pass filter or constrained convolutional layer). Which of these yields the best performance? Do certain other design parameters such as the pooling technique or choice of activation function have significant effects on the CNN’s accuracy? Do these design choices vary depending on the forensic task being considered? What effect do different training techniques such as batch normalization and local contrast normalization have on the CNN’s classification accuracy? In order to guide future research into the application of deep learning techniques for multimedia security, it is important to address these questions. In this paper, we systematically investigate several CNN design choices, then use the results of our investigation to present IS&T International Symposium on Electronic Imaging 2017 Media Watermarking, Security, and Forensics 2017 77 https://doi.org/10.2352/ISSN.2470-1173.2017.7.MWSF-328
IEEE Journal of Biomedical and Health Informatics | 2017
Belhassen Bayar; Nidhal Bouaynaya; Roman Shterenberg
We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).
ieee global conference on signal and information processing | 2014
Belhassen Bayar; Nidhal Bouaynaya; Roman Shterenberg
Compressive sensing is the theory of sparse signal recovery from undersampled measurements or observations. Exact signal reconstruction is an NP hard problem. A convex approximation using the l1-norm has received a great deal of theoretical attention. Exact recovery using the l1 approximation is only possible under strict conditions on the measurement matrix, which are difficult to check. Many greedy algorithms have thus been proposed. However, none of them is guaranteed to lead to the optimal (sparsest) solution. In this paper, we present a new greedy algorithm that provides an exact sparse solution of the problem. Unlike other greedy approaches, which are only approximations of the exact sparse solution, the proposed greedy approach, called Kernel Reconstruction, leads to the exact optimal solution in less operations than the original combinatorial problem. An application to the recovery of sparse gene regulatory networks is presented.
information hiding | 2018
Owen Mayer; Belhassen Bayar; Matthew C. Stamm
Recently, deep learning researchers have developed a technique known as deep features in which feature extractors for a task are learned by a CNN. These features are then provided to another classifier, or even used to perform a different classification task. Research in deep learning suggests that in some cases, deep features generalize to seemingly unrelated tasks. In this paper, we develop techniques for learning deep features that can be used across multiple forensic tasks, namely image manipulation detection and camera model identification. To do this, we develop two approaches for building deep forensic features: a transfer learning approach and a multitask learning approach. We experimentally evaluate the performance of both approaches in several scenarios and find that: 1) features learned for camera model identification generalize well to manipulation detection tasks but manipulation detection features do not generalize well to camera model identification, suggesting a task asymmetry, 2) deeper features are more task specific while shallower features generalize well across tasks, suggesting a feature hierarchy, and 3) a single, unified feature extractor can be learned that is highly discriminative for multiple forensic tasks. Furthermore, we find that when there is limited training data, a unified feature extractor can significantly outperform a targeted CNN.
information hiding | 2017
Belhassen Bayar; Matthew C. Stamm
Estimating manipulation parameter values is an important problem in image forensics. While several algorithms have been proposed to accomplish this, their application is exclusively limited to one type of image manipulation. These existing techniques are often designed using classical approaches from estimation theory by constructing parametric models of image data. This is problematic since this process of developing a theoretical model then deriving a parameter estimator must be repeated each time a new image manipulation is derived. In this paper, we propose a new data-driven generic approach to performing manipulation parameter estimation. Our proposed approach can be adapted to operate on several different manipulations without requiring a forensic investigator to make substantial changes to the proposed method. To accomplish this, we reformulate estimation as a classification problem by partitioning the parameter space into disjoint subsets such that each parameter subset is assigned a distinct class. Subsequently, we design a constrained CNN-based classifier that is able to extract classification features directly from data as well as estimating the manipulation parameter value in a subject image. Through a set of experiments, we demonstrated the effectiveness of our approach using four different types of manipulations.
international conference on bioinformatics | 2011
Belhassen Bayar; Nidhal Bouaynaya; Roman Shterenberg
Non-negative matrix factorization (NMF) has proven to be a useful decomposition for multivariate data. Specifically, NMF appears to have advantages over other clustering methods, such as hierarchical clustering, for identification of distinct molecular patterns in gene expression profiles. The NMF algorithm, however, is deterministic. In particular, it does not take into account the noisy nature of the measured genomic signals. In this paper, we extend the NMF algorithm to the probabilistic case, where the data is viewed as a stochastic process. We show that the probabilistic NMF can be viewed as a weighted regularized matrix factorization problem, and derive the corresponding update rules. Our simulation results show that the probabilistic non-negative matrix factorization (PNMF) algorithm is more accurate and more robust than its deterministic homologue in clustering cancer subtypes in a leukemia microarray dataset.
international conference on acoustics, speech, and signal processing | 2018
Belhassen Bayar; Matthew C. Stamm