Hisham Mohamed
University of Geneva
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hisham Mohamed.
parallel computing | 2013
Hisham Mohamed; Stéphane Marchand-Maillet
MapReduce is a programming model proposed to simplify large-scale data processing. In contrast, the message passing interface (MPI) standard is extensively used for algorithmic parallelization, as it accommodates an efficient communication infrastructure. In the original implementation of MapReduce, the reduce function can only start processing following termination of the map function. If the map function is slow for any reason, this will affect the whole running time. In this paper, we propose MapReduce overlapping using MPI, which is an adapted structure of the MapReduce programming model for fast intensive data processing. Our implementation is based on running the map and the reduce functions concurrently in parallel by exchanging partial intermediate data between them in a pipeline fashion using MPI. At the same time, we maintain the usability and the simplicity of MapReduce. Experimental results based on three different applications (WordCount, Distributed Inverted Indexing and Distributed Approximate Similarity Search) show a good speedup compared to the earlier versions of MapReduce such as Hadoop and the available MPI-MapReduce implementations.
international conference on multimedia retrieval | 2011
Marc von Wyl; Hisham Mohamed; Eric Bruno; Stéphane Marchand-Maillet
Indexing web-scale multimedia is only possible by distributing storage and computing efforts. Existing large-scale content-based indexing services mostly do not offer interactive relevance feedback. Here, we propose a running demonstrator of our Cross-Modal Search Engine (CMSE) implementing a query-by-example search strategy with relevance feedback and distributed over a cluster of 20 Dual core machines using MPI. We present the performance gain in terms of interactivity (search time) using a part of the Image-Net collection containing more than one million images as base example.
similarity search and applications | 2013
Hisham Mohamed; Stéphane Marchand-Maillet
Similarity search, translating into the nearest neighbor search problem, finds many applications for information retrieval and visualization, machine learning and data mining. The large volume of data that typical applications should handle imposes to find approximate solutions for the similarity search problem. Permutation-based indexing is one of the most recent techniques for approximate similarity search. Objects are represented by lists ordering their distances to a set of selected reference objects, following the idea that two neighboring objects have the same surrounding. In this paper, we propose a quantized representation of the permutation lists with its related data structure for effective retrieval. Our novel permutation-based indexing strategy is built to be fast, memory efficient and scalable without excessively sacrificing on search precision. This is experimentally demonstrated in comparison to existing proposals using several large-scale dataset of millions of documents and different dimensions.
Information Systems | 2015
Hisham Mohamed; Stéphane Marchand-Maillet
The K-Nearest Neighbor (K-NN) search problem is the way to find the K closest and most similar objects to a given query. The K-NN is essential for many applications such as information retrieval and visualization, machine learning and data mining. The exponential growth of data imposes to find approximate approaches to this problem. Permutation-based indexing is one of the most recent techniques for approximate similarity search. Objects are represented by permutation lists ordering their distances to a set of selected reference objects, following the idea that two neighboring objects have the same surrounding. In this paper, we propose a novel quantized representation of permutation lists with its related data structure for effective retrieval on single and multicore architectures. Our novel permutation-based indexing strategy is built to be fast, memory efficient and scalable. This is experimentally demonstrated in comparison to existing proposals using several large-scale datasets of millions of documents and of different dimensions. HighlightsA Multi-core indexing and searching implementations of our data structure.Test our proposal on the full CoPhIR dataset 106-million features.Compare our proposal to all the available permutation based indexing technique with larger datasets (1-million and 10-million).Compare our proposal to other approximate similarity search techniques like LSH-Forest and AM-Tree.
similarity search and applications | 2012
Hisham Mohamed; Stéphane Marchand-Maillet
We present parallel strategies for indexing and searching permutation-based indexes for high dimensional data using inverted files. In this paper, three strategies for parallelization are discussed; posting lists decomposition, reference points decomposition, and multiple independent inverted files. We study performance, efficiency, and effectiveness of our strategies on high dimensional datasets of millions of images. Experimental results show a good performance compared to the sequential version with the same efficiency and effectiveness.
international conference on parallel processing | 2012
Hisham Mohamed; Stéphane Marchand-Maillet
MapReduce is a programming model proposed by Google to simplify large-scale data processing. In contrast, the message passing interface (MPI) standard is extensively used for algorithmic parallelization, as it accommodates an efficient communication infrastructure. In the original implementation of MapReduce, the reduce function can only start processing following termination of the map function. If the map function is slow for any reason, this will affect the whole running time. In this paper, we propose MapReduce overlapping using MPI, which is an adapted structure of the MapReduce programming model for fast intensive data processing. Our implementation is based on running the map and the reduce functions concurrently in parallel by exchanging partial intermediate data between them in a pipeline fashion using MPI. At the same time, we maintain the usability and the simplicity of MapReduce. Experimental results based on two different applications (Word Count and Distributed Inverted Indexing) show a good speedup compared to the earlier versions of MapReduce such as Hadoop and the available MPI-MapReduce implementations. For word count, we are able to achieve 1.9x and 5.3x speedup comparing to Hadoop and MPI-MapReduce respectively for 53Gb of data.
content based multimedia indexing | 2012
Hisham Mohamed; Stéphane Marchand-Maillet
Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, The message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of MapReduce programming model using MPI for multimedia indexing. Experimental results on a large number of text (XML) excerpts related to images from the ImageNet corpus indicate that our implementation achieved good speedup compared to the sequential version and the earlier versions of MapReduce using MPI. Extensions to index large-scale multimedia collections are discussed.
similarity search and applications | 2014
Hisham Mohamed; Hasmik Osipyan; Stéphane Marchand-Maillet
Permutation-based indexing is a technique to approximate k-nearest neighbor computation in high-dimensional spaces. The technique aims to predict the proximity between elements encoding their location with respect to their surrounding. The strategy is fast and effective to answer user queries. The main constraint of this technique is the indexing time. Opening the GPUs to general purpose computation allows to perform parallel computation on a powerful platform. In this paper, we propose efficient indexing algorithms for the permutation-based indexing using multi-core architecture GPU and CPU. We study the performance and efficiency of our algorithms on large-scale datasets of millions of documents. Experimental results show a decrease of the indexing time.
database and expert systems applications | 2013
Hisham Mohamed; Stéphane Marchand-Maillet
In this paper, we propose an effective indexing and search algorithms for approximate K-NN based on an enhanced implementation of the Metric Suffix Array and Permutation-Based Indexing. Our main contribution is to propose a sound scalable strategy to prune objects based on the location of the reference objects in the query ordered lists. We study the performance and efficiency of our algorithms on large-scale dataset of millions of documents. Experimental results show a decrease of computational time while preserving the quality of the results.
content-based multimedia indexing | 2014
Hisham Mohamed; Hasmik Osipyan; Stéphane Marchand-Maillet
Searching for digital images in large-scale multimedia database is a hard problem due to the rapid increase of the digital assets. Metric Permutation Table is an efficient data structure for large-scale multimedia indexing. This data structure is based on the Permutation-based indexing, that aims to predict the proximity between elements encoding their location with respect to their surrounding. The main constraint of the Metric Permutation Table is the indexing time. With the exponential increase of multimedia data, parallel computation is needed. Opening the GPUs to general purpose computation allows to perform parallel computation on a powerful platform. In this paper, we propose efficient indexing and searching algorithms for the Metric Permutation Table using GPU and multi-core CPU. We study the performance and efficiency of our algorithms on large-scale datasets of millions of images. Experimental results show a decrease of the indexing time while preserving the quality of the results.