Mohamed Morchid | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohamed Morchid is active.

Explore More

Publication

Featured researches published by Mohamed Morchid.

Pattern Recognition Letters | 2014

Feature selection using Principal Component Analysis for massive retweet detection

Mohamed Morchid; Richard Dufour; Pierre-Michel Bousquet; Georges Linarès; Juan-Manuel Torres-Moreno

Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.

international conference on acoustics, speech, and signal processing | 2014

Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule

Mohamed Morchid; Richard Dufour; Pierre-Michel Bousquet; Mohamed Bouallegue; Georges Linarès; Renato De Mori

In this paper, we study the impact of dialogue representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. Two dialogue representations are firstly compared: the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-IDF-Gini) method and the Latent Dirichlet Allocation (LDA) approach. We then propose to study an original classification method that takes advantage of the LDA topic space representation, highlighted as the best dialogue representation. To do so, two assumptions about topic representation led us to choose a Gaussian process (GP) based method. This approach is compared with a Support Vector Machine (SVM) classification method. Results show that the GP approach is a better solution to deal with the multiple theme complexity of a dialogue, no matter the conditions studied (manual or automatic transcriptions). We finally discuss the impact of the topic space reduction on the classification accuracy.

empirical methods in natural language processing | 2014

An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

Mohamed Morchid; Mohamed Bouallegue; Richard Dufour; Georges Linarès; Driss Matrouf; Renato De Mori

Various studies highlighted that topicbased approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual document, to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, this multiple topic space representation is compacted into an elementary segment, calledc-vector, originally developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus of conversations. Results show the effectiveness of the proposed multi-view compact representation paradigm. Our identification system reaches an accuracy of 85%, with a significant gain of 9 points compared to the baseline (best single topic space configuration).

spoken language technology workshop | 2014

Author-topic based representation of call-center conversations

Mohamed Morchid; Richard Dufour; Mohamed Bouallegue; Georges Linarès

Performance of Automatic Speech Recognition (ASR) systems drops dramatically when transcribing conversations recorded in noisy conditions. Speech analytics suffer from this poor automatic transcription quality. To tackle this difficulty, a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to substantiate the drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet Allocation (LDA). Several studies demonstrate the effectiveness and reliability of this high-level representation. During the LDA learning process, distribution of words into each topic is estimated automatically. Nonetheless, in the context of a classification task, no consideration is made for the targeted classes. Thus, if the targeted application is to find out the main theme related to a dialogue, this information should be taken into consideration. In this paper, we propose to compare a classical topic-based representation of a dialogue, with a new one based not only on the dialogue content itself (words), but also on the theme related to the dialogue. This original representation is based on the author-topic (AT) model. The effectiveness of the proposed representation is evaluated on a classification task from automatic dialogue transcriptions between an agent and a customer of the Paris Transportation Company. Experiments confirmed that this author-topic model approach outperforms by far the classical topic representation, with a substantial gain of more than 7% in terms of correctly labeled conversations.

IEEE Transactions on Audio, Speech, and Language Processing | 2015

Compact multiview representation of documents based on the total variability space

Mohamed Morchid; Mohamed Bouallegue; Richard Dufour; Georges Linarès; Driss Matrouf; Renato De Mori

Mapping text documents in an LDA-based topic-space is a classical way to extract high-level representation of text documents. Unfortunately, LDA is highly sensitive to hyper-parameters related to the number of classes, or word and topic distribution, and there is no systematic way to pre-estimate optimal configurations. Moreover, various hyper-parameter configurations offer complementary views on the document. In this paper, we propose a method based on a two-step process that, first, expands the representation space by using a set of topic spaces and, second, compacts the representation space by removing poorly relevant dimensions. These two steps are based respectively on multi-view LDA-based representation spaces and factor-analysis models. This model provides a view-independent representation of documents while extracting complementary information from a massive multi-view representation. Experiments are conducted on the DECODA conversation corpus and the Reuters-21578 textual dataset. Results show the efficiency of the proposed multiview compact representation paradigm. The proposed categorization system reaches an accuracy of 86.5% with automatic transcriptions of conversations from DECODA corpus and a Macro-F1 of 80% during a classification task of the well-known Reuters-21578 corpus, with a significant gain compared to the baseline (best single topic space configuration), as well as methods and document representations previously studied.

6th International Workshop on Spoken Dialog Systems (IWSDS 2015) | 2015

Integration of Word and Semantic Features for Theme Identification in Telephone Conversations

Yannick Estève; Mohamed Bouallegue; Carole Lailler; Mohamed Morchid; Richard Dufour; Georges Linarès; Driss Matrouf; Renato De Mori

The paper describes a research about the possibility of integrating different types of word and semantic features for automatically identifying themes of real-life telephone conversations in a customer care service (CCS). Features are all the words of the application vocabulary, the probabilities obtained with latent Dirichlet allocation (LDA) of selected discriminative words and semantic features obtained with a limited human supervision of words and patterns expressing entities and relations of the application ontology. A deep neural network (DNN) is proposed for integrating these features. Experimental results on manual and automatic conversation transcriptions are presented showing the effective contribution of the integration. The results show how to automatically select a large subset of the test corpus with high precision and recall, making it possible to automatically obtain theme mention proportions in different time periods.

ieee automatic speech recognition and understanding workshop | 2015

Topic-space based setup of a neural network for theme identification of highly imperfect transcriptions

Mohamed Morchid; Richard Dufour; Georges Linarès

This paper presents a method for speech analytics that integrates topic-space based representation into a feed-forward artificial neural network (FFANN), working as a document classifier. The proposed method consists in configuring the FFANNs topology and in initializing the weights according to a previously estimated topic-space. Setup based on thematic priors is expected to improve the efficiency of the FFANNs weight optimization process, while speeding-up the training process and improving the classification accuracy. This method is evaluated on a spoken dialogue categorization task which is composed of customer-agent dialogues from the call-centre of Paris Public Transportation Company. Results show the interest of the proposed setup method, with a gain of more than 4 points in terms of classification accuracy, compared to the baseline. Moreover, experiments highlight that performance is weakly dependent to FFANNs topology with the LDA-based configuration, in comparison to classical empirical setup.

spoken language technology workshop | 2016

Quaternion Neural Networks for Spoken Language Understanding

Titouan Parcollet; Mohamed Morchid; Pierre-Michel Bousquet; Richard Dufour; Georges Linarès; Renato De Mori

Machine Learning (ML) techniques have allowed a great performance improvement of different challenging Spoken Language Understanding (SLU) tasks. Among these methods, Neural Networks (NN), or Multilayer Perceptron (MLP), recently received a great interest from researchers due to their representation capability of complex internal structures in a low dimensional subspace. However, MLPs employ document representations based on basic word level or topic-based features. Therefore, these basic representations reveal little in way of document statistical structure by only considering words or topics contained in the document as a “bag-of-words”, ignoring relations between them. We propose to remedy this weakness by extending the complex features based on Quaternion algebra presented in [1] to neural networks called QMLP. This original QMLP approach is based on hyper-complex algebra to take into consideration features dependencies in documents. New document features, based on the document structure itself, used as input of the QMLP, are also investigated in this paper, in comparison to those initially proposed in [1]. Experiments made on a SLU task from a real framework of human spoken dialogues showed that our QMLP approach associated with the proposed document features outperforms other approaches, with an accuracy gain of 2% with respect to the MLP based on real numbers and more than 3% with respect to the first Quaternion-based features proposed in [1]. We finally demonstrated that less iterations are needed by our QMLP architecture to be efficient and to reach promising accuracies.

Computer Speech & Language | 2016

Impact of Word Error Rate on theme identification task of highly imperfect human-human conversations

Mohamed Morchid; Richard Dufour; Georges Linarès

HighlightsReview of the impact of dialogue representations and classification methods.We discuss the impact of discriminative words in terms of transcription accuracy.Original study evaluating the impact of the WER in the LDA topic space. A review is proposed of the impact of word representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. We firstly compare two word-based representations using the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-IDF-Gini) method and the latent Dirichlet allocation (LDA) approach. We then introduce a classification method that takes advantage of the LDA topic space representation, highlighted as the best word representation. To do so, two assumptions about topic representation led us to choose a Gaussian Process (GP) based method. Its performance is compared with a classical Support Vector Machine (SVM) classification method. Experiments showed that the GP approach is a better solution to deal with the multiple theme complexity of a dialogue, no matter the conditions studied (manual or automatic transcriptions) (Morchid et al., 2014). In order to better understand results obtained using different word representation methods and classification approaches, we then discuss the impact of discriminative and non-discriminative words extracted by both word representations methods in terms of transcription accuracy (Morchid et al., 2014). Finally, we propose a novel study that evaluates the impact of the Word Error Rate (WER) in the LDA topic space learning process as well as during the theme identification task. This original qualitative study points out that selecting a small subset of words having the lowest WER (instead of using all the words) allows the system to better classify automatic transcriptions with an absolute gain of 0.9 point, in comparison to the best performance achieved on this dialogue classification task (precision of 83.3%).

conference of the international speech communication association | 2016

Deep Stacked Autoencoders for Spoken Language Understanding.

Killian Janod; Mohamed Morchid; Richard Dufour; Georges Linarès; Renato De Mori

The automatic transcription process of spoken document results in several word errors, especially when very noisy conditions are encountered. Document representations based on neural embedding frameworks have recently shown significant improvements in different Spoken and Natural Language Understanding tasks such as denoising and filtering. Nonetheless, these methods mainly need clean representations, failing to properly remove noise contained in noisy representations. This paper proposes to study the impact of residual noise contained into automatic transcripts of spoken dialogues in highly abstract spaces from deep neural networks. The paper makes the assumption that the noise learned from “clean” manual transcripts of spoken documents moves down dramatically the performance of theme identification systems in noisy conditions. The proposed deep neural network takes, as input and output, highly imperfect transcripts from spoken dialogues to improve the robustness of the document representation in a noisy environment. Results obtained on the DECODA theme classification task of dialogues reach an accuracy of 82% with a significant gain of about 5%.

Explore More