Yllias Chali | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yllias Chali is active.

Explore More

Publication

Featured researches published by Yllias Chali.

Journal of Artificial Intelligence Research | 2009

Complex question answering: unsupervised learning approaches and experiments

Yllias Chali; Shafiq R. Joty; Sadid A. Hasan

Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. In this paper, we experiment with one empirical method and two unsupervised statistical machine learning techniques: K-means and Expectation Maximization (EM), for computing relative importance of the sentences. We compare the results of these approaches. Our experiments show that the empirical approach outperforms the other two techniques and EM performs better than K-means. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. In order to measure the importance and relevance to the user query we extract different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences. We use a local search technique to learn the weights of the features. To the best of our knowledge, no study has used tree kernel functions to encode syntactic/semantic information for more complex tasks such as computing the relatedness between the query sentences and the document sentences in order to generate query-focused summaries (or answers to complex questions). For each of our methods of generating summaries (i.e. empirical, K-means and EM) we show the effects of syntactic and shallow-semantic features over the bag-of-words (BOW) features.

Natural Language Engineering | 2012

Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches

Yllias Chali; Sadid A. Hasan

In this paper, we apply different supervised learning techniques to build query-focused multi-document summarization systems, where the task is to produce automatic summaries in response to a given query or specific information request stated by the user. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time-consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element overlap, syntactic similarity measure, semantic similarity measure, and Extended String Subsequence Kernel. The supervised methods we use are Support Vector Machines, Conditional Random Fields, Hidden Markov Models, Maximum Entropy, and two ensemble-based approaches. During different experiments, we analyze the impact of automatic labeling methods on the performance of the applied supervised methods. To our knowledge, no other study has deeply investigated and compared the effects of using different automatic annotation techniques on different supervised learning approaches in the domain of query-focused multi-document summarization.

Information Processing and Management | 2011

Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels

Yllias Chali; Sadid A. Hasan; Shafiq R. Joty

The task of answering complex questions requires inferencing and synthesizing information from multiple documents that can be seen as a kind of topic-oriented, informative multi-document summarization. In generic summarization the stochastic, graph-based random walk method to compute the relative importance of textual units (i.e. sentences) is proved to be very successful. However, the major limitation of the TF^*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. This paper presents the impact of syntactic and semantic information in the graph-based random walk method for answering complex questions. Initially, we apply tree kernel functions to perform the similarity measures between sentences in the random walk framework. Then, we extend our work further to incorporate the Extended String Subsequence Kernel (ESSK) to perform the task in a similar manner. Experimental results show the effectiveness of the use of kernels to include the syntactic and semantic information for this task.

canadian conference on artificial intelligence | 2009

A SVM-Based Ensemble Approach to Multi-Document Summarization

Yllias Chali; Sadid A. Hasan; Shafiq R. Joty

In this paper, we present a Support Vector Machine (SVM) based ensemble approach to combat the extractive multi-document summarization problem. Although SVM can have a good generalization ability, it may experience a performance degradation through wrong classifications. We use a committee of several SVMs, i.e. Cross-Validation Committees (CVC), to form an ensemble of classifiers where the strategy is to improve the performance by correcting errors of one classifier using the accurate output of others. The practicality and effectiveness of this technique is demonstrated using the experimental results.

meeting of the association for computational linguistics | 2008

Improving the Performance of the Random Walk Model for Answering Complex Questions

Yllias Chali; Shafiq R. Joty

We consider the problem of answering complex questions that require inferencing and synthesizing information from multiple documents and can be seen as a kind of topic-oriented, informative multi-document summarization. The stochastic, graph-based method for computing the relative importance of textual units (i.e. sentences) is very successful in generic summarization. In this method, a sentence is encoded as a vector in which each component represents the occurrence frequency (TF*IDF) of a word. However, the major limitation of the TF*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. In this paper, we study the impact of syntactic and shallow semantic information in the graph-based method for answering complex questions.

Computational Linguistics | 2015

Towards topic-to-question generation

Yllias Chali; Sadid A. Hasan

This paper is concerned with automatic generation of all possible questions from a topic of interest. Specifically, we consider that each topic is associated with a body of texts containing useful information about the topic. Then, questions are generated by exploiting the named entity information and the predicate argument structures of the sentences present in the body of texts. The importance of the generated questions is measured using Latent Dirichlet Allocation by identifying the subtopics (which are closely related to the original topic) in the given body of texts and applying the Extended String Subsequence Kernel to calculate their similarity with the questions. We also propose the use of syntactic tree kernels for the automatic judgment of the syntactic correctness of the questions. The questions are ranked by considering both their importance (in the context of the given body of texts) and syntactic correctness. To the best of our knowledge, no previous study has accomplished this task in our setting. A series of experiments demonstrate that the proposed topic-to-question generation approach can significantly outperform the state-of-the-art results.

meeting of the association for computational linguistics | 2009

Do Automatic Annotation Techniques Have Any Impact on Supervised Complex Question Answering

Yllias Chali; Sadid A. Hasan; Shafiq R. Joty

In this paper, we analyze the impact of different automatic annotation methods on the performance of supervised approaches to the complex question answering problem (defined in the DUC-2007 main task). Huge amount of annotated or labeled data is a prerequisite for supervised training. The task of labeling can be accomplished either by humans or by computer programs. When humans are employed, the whole process becomes time consuming and expensive. So, in order to produce a large set of labeled data we prefer the automatic annotation strategy. We apply five different automatic annotation techniques to produce labeled data using ROUGE similarity measure, Basic Element (BE) overlap, syntactic similarity measure, semantic similarity measure, and Extended String Subsequence Kernel (ESSK). The representative supervised methods we use are Support Vector Machines (SVM), Conditional Random Fields (CRF), Hidden Markov Models (HMM), and Maximum Entropy (Max-Ent). Evaluation results are presented to show the impact.

empirical methods in natural language processing | 2008

Selecting Sentences for Answering Complex Questions

Yllias Chali; Shafiq R. Joty

Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization. In this paper, we have experimented with one empirical and two unsupervised statistical machine learning techniques: k-means and Expectation Maximization (EM), for computing relative importance of the sentences. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. We extracted different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences in order to measure its importance and relevancy to the user query. We used a local search technique to learn the weights of the features. For all our methods of generating summaries, we have shown the effects of syntactic and shallow-semantic features over the bag of words (BOW) features.

empirical methods in natural language processing | 2014

Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning

Cody Rioux; Sadid A. Hasan; Yllias Chali

This paper explores alternate algorithms, reward functions and feature sets for performing multi-document summarization using reinforcement learning with a high focus on reproducibility. We show that ROUGE results can be improved using a unigram and bigram similarity metric when training a learner to select sentences for summarization. Learners are trained to summarize document clusters based on various algorithms and reward functions and then evaluated using ROUGE. Our experiments show a statistically significant improvement of 1.33%, 1.58%, and 2.25% for ROUGE-1, ROUGE-2 and ROUGEL scores, respectively, when compared with the performance of the state of the art in automatic summarization with reinforcement learning on the DUC2004 dataset. Furthermore query focused extensions of our approach show an improvement of 1.37% and 2.31% for ROUGE-2 and ROUGE-SU4 respectively over query focused extensions of the state of the art with reinforcement learning on the DUC2006 dataset.

Information Processing and Management | 2015

A reinforcement learning formulation to the complex question answering problem

Yllias Chali; Sadid A. Hasan; Mustapha Mojahid

Reinforcement learning formulation for complex question answering.Abstract summaries used for small amount of supervision using reward scores.User interaction component incorporated to guide candidate sentence selection.Experiments reveal that systems trained with user interaction perform better.The reinforcement system is able to learn automatically and effectively. We use extractive multi-document summarization techniques to perform complex question answering and formulate it as a reinforcement learning problem. Given a set of complex questions, a list of relevant documents per question, and the corresponding human generated summaries (i.e. answers to the questions) as training data, the reinforcement learning module iteratively learns a number of feature weights in order to facilitate the automatic generation of summaries i.e. answers to previously unseen complex questions. A reward function is used to measure the similarities between the candidate (machine generated) summary sentences and the abstract summaries. In the training stage, the learner iteratively selects the important document sentences to be included in the candidate summary, analyzes the reward function and updates the related feature weights accordingly. The final weights are used to generate summaries as answers to unseen complex questions in the testing stage. Evaluation results show the effectiveness of our system. We also incorporate user interaction into the reinforcement learner to guide the candidate summary sentence selection process. Experiments reveal the positive impact of the user interaction component on the reinforcement learning framework.

Explore More