Saeedeh Momtazi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saeedeh Momtazi is active.

Explore More

Publication

Featured researches published by Saeedeh Momtazi.

conference on information and knowledge management | 2009

A word clustering approach for language model-based sentence retrieval in question answering systems

Saeedeh Momtazi; Dietrich Klakow

In this paper we propose a term clustering approach to improve the performance of sentence retrieval in Question Answering (QA) systems. As the search in question answering is conducted over smaller segments of data than in a document retrieval task, the problems of data sparsity and exact matching become more critical. In this paper we propose Language Modeling (LM) techniques to overcome such problems and improve the sentence retrieval performance. Our proposed methods include building class-based models by term clustering, and then employing higher order n-grams with the new class-based model. We report our experiments on the TREC 2007 questions from QA track. The results show that the methods investigated here enhanced the mean average precision of sentence retrieval from 23.62% to 29.91%.

systems, man and cybernetics | 2009

An overview on the existing language models for prediction systems as writing assistant tools

Masood Ghayoomi; Saeedeh Momtazi

The prediction task in national language processing means to guess the missing letter, word, phrase, or sentence that likely follow in a given segment of a text. Since 1980s many systems with different methods were developed for different languages. In this paper an overview of the existing prediction methods that have been used for more than two decades are described and a general classification of the approaches is presented. The three main categories of the classification are statistical modeling, knowledge-based modeling, and heuristic modeling (adaptive).

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2013

Topic modeling for expert finding using latent Dirichlet allocation

Saeedeh Momtazi; Felix Naumann

The task of expert finding is to rank the experts in the search space given a field of expertise as an input query. In this paper, we propose a topic modeling approach for this task. The proposed model uses latent Dirichlet allocation (LDA) to induce probabilistic topics. In the first step of our algorithm, the main topics of a document collection are extracted using LDA. The extracted topics present the connection between expert candidates and user queries. In the second step, the topics are used as a bridge to find the probability of selecting each candidate for a given query. The candidates are then ranked based on these probabilities. The experimental results on the Text REtrieval Conference (TREC) Enterprise track for 2005 and 2006 show that the proposed topic‐based approach outperforms the state‐of‐the‐art profile‐ and document‐based models, which use information retrieval methods to rank experts. Moreover, we present the superiority of the proposed topic‐based approach to the improved document‐based expert finding systems, which consider additional information such as local context, candidate prior, and query expansion.

international acm sigir conference on research and development in information retrieval | 2010

Hierarchical pitman-yor language model for information retrieval

Saeedeh Momtazi; Dietrich Klakow

In this paper, we propose a new application of Bayesian language model based on Pitman-Yor process for information retrieval. This model is a generalization of the Dirichlet distribution. The Pitman-Yor process creates a power-law distribution which is one of the statistical properties of word frequency in natural language. Our experiments on Robust04 indicate that this model improves the document retrieval performance compared to the commonly used Dirichlet prior and absolute discounting smoothing techniques.

Information Processing and Management | 2015

Bridging the vocabulary gap between questions and answer sentences

Saeedeh Momtazi; Dietrich Klakow

We introduce two novel LM-based models to relax the exact matching assumption in IR.The class-based model clusters words to provide a coarse-grained word representation.The trigger model captures pairs of trigger and target words to find word relationships.Different types of word co-occurrence and triggering are studied within the models.We further studied the combination of both models to achieve the best result. We propose two novel language models to improve the performance of sentence retrieval in Question Answering (QA): class-based language model and trained trigger language model. As the search in sentence retrieval is conducted over smaller segments of text than in document retrieval, the problems of data sparsity and exact matching become more critical. Different techniques such as the translation model are also proposed to overcome the word mismatch problem. Our class-based and trained trigger language models, however, use different approaches to this aim and are shown to outperform the exiting models. The class model uses word clustering algorithm to capture term relationships. In this model, we assume a relation between the terms that belong to the same clusters; as a result, they can be substituted when searching for relevant sentences. The trigger model captures pairs of trigger and target words while training on a large corpus. The model considers a relation between a question and a sentence, if a trigger word appears in the question and the sentence contains the corresponding target word. For both proposed models, we introduce different notions of co-occurrence to find word relations. In addition, we study the impact of corpus size and domain on the models. Our experiments on TREC QA collection verify that the proposed model significantly improves the sentence retrieval performance compared to the state-of-the-art translation model. While the translation model based on mutual information (Karimzadehgan and Zhai, 2010) has 0.3927 Mean Average Precision (MAP), the class model achieves 0.4174 MAP and the trigger model enhances the performance to 0.4381.

information sciences, signal processing and their applications | 2007

A POS-based fuzzy word clustering algorithm for continuous speech recognition systems

Saeedeh Momtazi; Hossein Sameti; Mohammad Bahrani; Nazila Hafezi

Using word base n-gram language models in continuous speech recognition systems is so prevalent. For using this type of language models, we should extract them from large corpora. Since Persian corpora are not rich, therefore the extracted language models are not credible. For this reason, most researchers extract class n-grams instead of finding word n-grams. In this research a new idea for fuzzy word clustering is represented that each word can be assigned to more that one class. The Fuzzy c-mean algorithm is used for our clustering method and we have examined its various parameters of it. Finally, this algorithm was applied on 20000 most frequent Persian words extracted from ldquoPersian Text Corpusrdquo. The extracted language models are evaluated by perplexity criterion and the results show that a considerable reduction in perplexity has been achieved. Also, the results of this language model were evaluated on speaker independent continuous speech recognition system and improved the system accuracy.

international conference on asian language processing | 2009

Challenges in Developing Persian Corpora from Online Resources

Masood Ghayoomi; Saeedeh Momtazi

Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian from on-line materials are discussed. The sources of the problems are the Persian script itself; mixture with the Arabic script; Persian orthography; the typists’ typing styles; and mixing Persian code pages with Arabic code pages in operating systems.

intelligent systems design and applications | 2009

A Combined Query Expansion Technique for Retrieving Opinions from Blogs

Saeedeh Momtazi; Stefan Kazalski; Dietrich Klakow

In this paper, we discuss the the role of the retrieval component in an TREC style opinion question answering system. Since blog retrieval differs from traditional ad-hoc document retrieval, we need to work on dedicated retrieval methods. In particular we focus on a new query expansion technique to retrieve people’s opinions from blog posts. We propose a combined approach for expanding queries while considering two aspects: finding more relevant data, and finding more opinionative data. We introduce a method to select opinion bearing terms for query expansion based on a chi-squared test and use this new query expansion to combine it in a liner weighting scheme with the original query terms and relevant feedback terms from web. We report our experiments on the TREC 2006 and TREC 2007 queries from the blog retrieval track. The results show that the methods investigated here enhanced mean average precision of document retrieval from 17.91% to 25.20% on TREC 2006 and from 22.28% to 32.61% on TREC 2007 queries.

Information Processing and Management | 2018

Unsupervised Latent Dirichlet Allocation for supervised question classification

Saeedeh Momtazi

Abstract Question answering systems assist users in satisfying their information needs more precisely by providing focused responses to their questions. Among the various systems developed for such a purpose, community-based question answering has recently received researchers’ attention due to the large amount of user-generated questions and answers in social question-and-answer platforms. Reusing such data sources requires an accurate information retrieval component enhanced by a question classifier. The question classification gives the system the possibility to have information about question categories to focus on questions and answers from relevant categories to the input question. In this paper, we propose a new method based on unsupervised Latent Dirichlet Allocation for classifying questions in community-based question answering. Our method first uses unsupervised topic modeling to extract topics from a large amount of unlabeled data. The learned topics are then used in the training phase to find their association with the available category labels in the training data. The category mixture of topics is finally used to predict the label of unseen data.

Journal of Information Science | 2016

Generating query suggestions by exploiting latent semantics in query logs

Saeedeh Momtazi; Fabian Lindenberg

Search engines assist users in expressing their information needs more accurately by reformulating the issued queries automatically and suggesting the generated formulations to the users. Many approaches to query suggestion draw on the information stored in query logs, recommending recorded queries that are textually similar to the current user’s query or that frequently co-occurred with it in the past. In this paper, we propose an approach that concentrates on deducing the actual information need from the user’s query. The challenge therein lies not only in processing keyword queries, which are often short and possibly ambiguous, but especially in handling the complexity of natural language that allows users to express the same or similar information needs in various differing ways. We expect a higher-level semantic representation of a user’s query to more accurately reflect the information need than the explicit query terms alone can. To this aim, we employ latent Dirichlet allocation as a probabilistic topic model to reveal latent semantics in the query log. Our evaluations show that, whereas purely topic-based query suggestion performs the worst, the interpolation of our proposed topic-based model with the baseline word-based model that generates suggestions based on matching query terms achieves significant improvements in suggestion quality over the already well performing purely word-based approach.

Explore More