Ramy Baly
American University of Beirut
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ramy Baly.
empirical methods in natural language processing | 2014
Gilbert Badaro; Ramy Baly; Hazem M. Hajj; Nizar Habash; Wassim El-Hajj
Most opinion mining methods in English rely successfully on sentiment lexicons, such as English SentiWordnet (ESWN). While there have been efforts towards building Arabic sentiment lexicons, they suffer from many deficiencies: limited size, unclear usability plan given Arabic’s rich morphology, or nonavailability publicly. In this paper, we address all of these issues and produce the first publicly available large scale Standard Arabic sentiment lexicon (ArSenL) using a combination of existing resources: ESWN, Arabic WordNet, and the Standard Arabic Morphological Analyzer (SAMA). We compare and combine two methods of constructing this lexicon with an eye on insights for Arabic dialects and other low resource languages. We also present an extrinsic evaluation in terms of subjectivity and sentiment analysis.
meeting of the association for computational linguistics | 2015
Gilbert Badaro; Ramy Baly; Rana Akel; Linda Fayad; Jeffrey Khairallah; Hazem M. Hajj; Khaled Bashir Shaban; Wassim El-Hajj
Most advanced mobile applications require server-based and communication. This often causes additional energy consumption on the already energy-limited mobile devices. In this work, we provide to address these limitations on the mobile for Opinion Mining in Arabic. Instead of relying on compute-intensive NLP processing, the method uses an Arabic lexical resource stored on the device. Text is stemmed, and the words are then matched to our own developed ArSenL. ArSenL is the first publicly available large scale Standard Arabic sentiment lexicon (ArSenL) developed using a combination of English SentiWordnet (ESWN), Arabic WordNet, and the Arabic Morphological Analyzer (AraMorph). The scores from the matched stems are then processed through a classifier for determining the polarity. The method was tested on a published set of Arabic tweets, and an average accuracy of 67% was achieved. The developed mobile application is also made publicly available. The application takes as input a topic of interest and retrieves the latest Arabic tweets related to this topic. It then displays the tweets superimposed with colors representing sentiment labels as positive, negative or neutral. The application also provides visual summaries of searched topics and a history showing how the sentiments for a certain topic have been evolving.
acm transactions on asian and low resource language information processing | 2017
Ahmad Al-Sallab; Ramy Baly; Hazem M. Hajj; Khaled Bashir Shaban; Wassim El-Hajj; Gilbert Badaro
While research on English opinion mining has already achieved significant progress and success, work on Arabic opinion mining is still lagging. This is mainly due to the relative recency of research efforts in developing natural language processing (NLP) methods for Arabic, handling its morphological complexity, and the lack of large-scale opinion resources for Arabic. To close this gap, we examine the class of models used for English and that do not require extensive use of NLP or opinion resources. In particular, we consider the Recursive Auto Encoder (RAE). However, RAE models are not as successful in Arabic as they are in English, due to their limitations in handling the morphological complexity of Arabic, providing a more complete and comprehensive input features for the auto encoder, and performing semantic composition following the natural way constituents are combined to express the overall meaning. In this article, we propose A Recursive Deep Learning Model for Opinion Mining in Arabic (AROMA) that addresses these limitations. AROMA was evaluated on three Arabic corpora representing different genres and writing styles. Results show that AROMA achieved significant performance improvements compared to the baseline RAE. It also outperformed several well-known approaches in the literature.
ACM Transactions on Information Systems | 2016
Ramy Baly; Roula Hobeica; Hazem M. Hajj; Wassim El-Hajj; Khaled Bashir Shaban; Ahmad Al-Sallab
This article introduces a sentiment analysis approach that adopts the way humans read, interpret, and extract sentiment from text. Our motivation builds on the assumption that human interpretation should lead to the most accurate assessment of sentiment in text. We call this automated process Human Reading for Sentiment (HRS). Previous research in sentiment analysis has produced many frameworks that can fit one or more of the HRS aspects; however, none of these methods has addressed them all in one approach. HRS provides a meta-framework for developing new sentiment analysis methods or improving existing ones. The proposed framework provides a theoretical lens for zooming in and evaluating aspects of any sentiment analysis method to identify gaps for improvements towards matching the human reading process. Key steps in HRS include the automation of humans low-level and high-level cognitive text processing. This methodology paves the way towards the integration of psychology with computational linguistics and machine learning to employ models of pragmatics and discourse analysis for sentiment analysis. HRS is tested with two state-of-the-art methods; one is based on feature engineering, and the other is based on deep learning. HRS highlighted the gaps in both methods and showed improvements for both.
acm transactions on asian and low resource language information processing | 2017
Ramy Baly; Hazem M. Hajj; Nizar Habash; Khaled Bashir Shaban; Wassim El-Hajj
Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ArSenTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.
Proceedings of the Third Arabic Natural Language Processing Workshop | 2017
Ramy Baly; Gilbert Badaro; Georges El-Khoury; Rawan Moukalled; Rita Aoun; Hazem M. Hajj; Wassim El-Hajj; Nizar Habash; Khaled Bashir Shaban
Opinion mining in Arabic is a challenging task given the rich morphology of the language. The task becomes more challenging when it is applied to Twitter data, which contains additional sources of noise, such as the use of unstandardized dialectal variations, the nonconformation to grammatical rules, the use of Arabizi and code-switching, and the use of non-text objects such as images and URLs to express opinion. In this paper, we perform an analytical study to observe how such linguistic phenomena vary across different Arab regions. This study of Arabic Twitter characterization aims at providing better understanding of Arabic Tweets, and fostering advanced research on the topic. Furthermore, we explore the performance of the two schools of machine learning on Arabic Twitter, namely the feature engineering approach and the deep learning approach. We consider models that have achieved state-of-the-art performance for opinion mining in English. Results highlight the advantages of using deep learning-based models, and confirm the importance of using morphological abstractions to address Arabic’s complex morphology.
Procedia Computer Science | 2017
Ramy Baly; Georges El-Khoury; Rawan Moukalled; Rita Aoun; Hazem M. Hajj; Khaled Bashir Shaban; Wassim El-Hajj
Abstract Sentiment analysis in Arabic is challenging due to the complex morphology of the language. The task becomes more challenging when considering Twitter data that contain significant amounts of noise such as the use of Arabizi, code-switching and different dialects that varies significantly across the Arab world, the use of non-textual objects to express sentiments, and the frequent occurrence of misspellings and grammatical mistakes. Modeling sentiment in Twitter should become easier when we understand the characteristics of Twitter data and how its usage varies from one Arab region to another. We describe our effort to create the first Multi-Dialect Arabic Sentiment Twitter Dataset (MD-ArSenTD) that is composed of tweets collected from 12 Arab countries, annotated for sentiment and dialect. We use this dataset to analyze tweets collected from Egypt and the United Arab Emirates (UAE), with the aim of discovering distinctive features that may facilitate sentiment analysis. We also perform a comparative evaluation of different sentiment models on Egyptian and UAE tweets. These models are based on feature engineering and deep learning, and have already achieved state-of-the-art accuracies in English sentiment analysis. Results indicate the superior performance of deep learning models, the importance of morphological features in Arabic NLP, and that handling dialectal Arabic leads to different outcomes depending on the country from which the tweets are collected.
IEEE Transactions on Semiconductor Manufacturing | 2012
Ramy Baly; Hazem M. Hajj
meeting of the association for computational linguistics | 2015
Ahmad A. Al Sallab; Hazem M. Hajj; Gilbert Badaro; Ramy Baly; Wassim El Hajj; Khaled Bashir Shaban
meeting of the association for computational linguistics | 2017
Ramy Baly; Gilbert Badaro; Ali Hamdi; Rawan Moukalled; Rita Aoun; Georges El-Khoury; Ahmad A. Al Sallab; Hazem M. Hajj; Nizar Habash; Khaled Bashir Shaban; Wassim El-Hajj