Mohamed G. Elfeky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohamed G. Elfeky is active.

Explore More

Publication

Featured researches published by Mohamed G. Elfeky.

very large data bases | 2003

A Bayesian decision model for cost optimal record matching

Vassilios S. Verykios; George V. Moustakides; Mohamed G. Elfeky

Abstract. In an error-free system with perfectly clean data, the construction of a global view of the data consists of linking - in relational terms, joining - two or more tables on their key fields. Unfortunately, most of the time, these data are neither carefully controlled for quality nor necessarily defined commonly across different data sources. As a result, the creation of such a global data view resorts to approximate joins. In this paper, an optimal solution is proposed for the matching or the linking of database record pairs in the presence of inconsistencies, errors or missing values in the data. Existing models for record matching rely on decision rules that minimize the probability of error, that is the probability that a sample (a measurement vector) is assigned to the wrong class. In practice though, minimizing the probability of error is not the best criterion to design a decision rule because the misclassifications of different samples may have different consequences. In this paper we present a decision model that minimizes the cost of making a decision. In particular: (a) we present a decision rule: (b) we prove that this rule is optimal with respect to the cost of a decision: and (c) we compute the probabilities of the two types of errors (Type I and Type II) that incur when this rule is applied. We also present a closed form decision model for a certain class of record comparison pairs along with an example, and results from comparing the proposed cost-based model to the error-based model, for large record comparison spaces.

international conference on data mining | 2005

WARP: time warping for periodicity detection

Mohamed G. Elfeky; Walid G. Aref; Ahmed K. Elmagarmid

Periodicity mining is used for predicting trends in time series data. Periodicity detection is an essential process in periodicity mining to discover potential periodicity rates. Existing periodicity detection algorithms do not take into account the presence of noise, which is inevitable in almost every real-world time series data. In this paper, we tackle the problem of periodicity detection in the presence of noise. We propose a new periodicity detection algorithm that deals efficiently with all types of noise. Based on time warping, the proposed algorithm warps (extends or shrinks) the time axis at various locations to optimally remove the noise. Experimental results show that the proposed algorithm outperforms the existing periodicity detection algorithms in terms of noise resiliency.

international conference on data mining | 2010

Mining Arabic Business Reviews

Mohamed Elhawary; Mohamed G. Elfeky

For languages with rich content over the web, business reviews are easily accessible via many known websites, e.g., Yelp.com. For languages with poor content over the web like Arabic, there are very few websites (we are actually aware of only one that is indeed unpopular) that provide business reviews. However, this does not mean that such reviews do not exist. They indeed exist unstructured in websites not originally intended for reviews, e.g., Forums and Blogs. Hence, there is a need to mine for those Arabic reviews from the web in order to provide them in the search results when a user searches for a business or a category of businesses. In this paper, we show how to extract the business reviews scattered on the web written in the Arabic language. The mined reviews are analyzed to also provide their sentiments (positive, negative or neutral). This way, we provide our users the information they need about the local businesses in the language they understand, and therefore provide a better search experience for the Middle East region, which mostly speaks Arabic.

extending database technology | 2004

Using Convolution to Mine Obscure Periodic Patterns in One Pass

Mohamed G. Elfeky; Walid G. Aref; Ahmed K. Elmagarmid

The mining of periodic patterns in time series databases is an interesting data mining problem that can be envisioned as a tool for forecasting and predicting the future behavior of time series data. Existing periodic patterns mining algorithms either assume that the periodic rate (or simply the period) is user-specified, or try to detect potential values for the period in a separate phase. The former assumption is a considerable disadvantage, especially in time series databases where the period is not known a priori. The latter approach results in a multi-pass algorithm, which on the other hand is to be avoided in online environments (e.g., data streams). In this paper, we develop an algorithm that mines periodic patterns in time series databases with unknown or obscure periods such that discovering the period is part of the mining process. Based on convolution, our algorithm requires only one pass over a time series of length n, with O(n log n) time complexity.

international conference on data mining | 2006

STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows

Mohamed G. Elfeky; Walid G. Aref; Ahmed K. Elmagarmid

Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, real-data experiments demonstrate the practicality of the discovered periodic patterns.

Proceedings of the International Symposium on Objects and Databases | 2000

ODMQL: Object Data Mining Query Language

Mohamed G. Elfeky; Amani A. Saad; Souheir A. Fouad

Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. The emerging data mining tools and systems lead to the demand of a powerful data mining query language. The concepts of such a language for relational databases are discussed before. With the increasing popularity of object-oriented databases, it is important to design a data mining query language for such databases. The main objective of this paper is to propose an Object Data Mining Query Language (ODMQL) for object-oriented databases as an extension to the Object Query Language (OQL) proposed by the Object Data Management Group (ODMG) as a standard query language for object-oriented databases. The proposed language is implemented as a feature of an experimental object-oriented database management system that is developed as a testbed for research issues of object-oriented databases.

international conference on acoustics, speech, and signal processing | 2016

Selection and combination of hypotheses for dialectal speech recognition

Victor Soto; Olivier Siohan; Mohamed G. Elfeky; Pedro J. Moreno

While research has often shown that building dialect-specific Automatic Speech Recognizers is the optimal approach to dealing with dialectal variations of the same language, we have observed that dialect-specific recognizers do not always output the best recognitions. Often enough, another dialectal recognizer outputs a better recognition than the dialect-specific one. In this paper, we present two methods to select and combine the best decoded hypothesis from a pool of dialectal recognizers. We follow a Machine Learning approach and extract features from the Speech Recognition output along with Word Embeddings and use Shallow Neural Networks for classification. Our experiments using Dictation and Voice Search data from the main four Arabic dialects show good WER improvements for the hypothesis selection scheme, reducing the WER by 2.1 to 12.1% depending on the test set, and promising results for the hypotheses combination scheme.

spoken language technology workshop | 2016

Towards acoustic model unification across dialects

Mohamed G. Elfeky; Meysam Bastani; Xavier Velez; Pedro J. Moreno; Austin Waters

Acoustic model performance typically decreases when evaluated on a dialectal variation of the same language that was not used during training. Similarly, models simultaneously trained on a group of dialects tend to underperform dialect-specific models. In this paper, we report on our efforts towards building a unified acoustic model that can serve a multi-dialectal language. Two techniques are presented: Distillation and MultiTask Learning (MTL). In Distillation, we use an ensemble of dialect-specific acoustic models and distill its knowledge in a single model. In MTL, we utilize multitask learning to train a unified acoustic model that learns to distinguish dialects as a side task. We show that both techniques are superior to the jointly-trained model that is trained on all dialectal data, reducing word error rates by 4:2% and 0:6%, respectively. While achieving this improvement, neither technique degrades the performance of the dialect-specific models by more than 3:4%.

Procedia Computer Science | 2018

Multi-Dialectical Languages Effect on Speech Recognition: Too Much Choice Can Hurt

Mohamed G. Elfeky; Pedro J. Moreno; Victor Soto

Abstract Research has shown that automatic speech recognition (ASR) performance typically decreases when evaluated on a dialectal variation of the same language that was not used for training its models. Similarly, models simultaneously trained on a group of dialects tend to underperform when compared to dialect-specific models. When trying to decide which dialect-specific model (recognizer) to use to decode an utterance (e.g., a voice search query), possible strategies include automatically detecting the spoken dialect or following the user’s language preferences as set in his/her cell phone. In this paper, we observe that user’s voice search queries are usually directed to a dialect-specific recognizer that does not match the user’s current location, and present a study that shows that automatically selecting the recognizer based on the user’s geographical location helps improve the user experience.

international conference on data engineering | 2002