Bahar Salehi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bahar Salehi is active.

Explore More

Publication

Featured researches published by Bahar Salehi.

north american chapter of the association for computational linguistics | 2015

A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions

Bahar Salehi; Paul Cook; Timothy Baldwin

This paper presents the first attempt to use word embeddings to predict the compositionality of multiword expressions. We consider both single- and multi-prototype word embeddings. Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity. Our best results are competitive with, or superior to, state-of-the-art methods over three standard compositionality datasets, which include two types of multiword expressions and two languages.

conference of the european chapter of the association for computational linguistics | 2014

Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality

Bahar Salehi; Paul Cook; Timothy Baldwin

We predict the compositionality of multiword expressions using distributional similarity between each component word and the overall expression, based on translations into multiple languages. We evaluate the method over English noun compounds, English verb particle constructions and German noun compounds. We show that the estimation of compositionality is improved when using translations into multiple languages, as compared to simply using distributional similarity in the source language. We further find that string similarity complements distributional similarity. 1 Compositionality of MWEs Multiword expressions (hereafter MWEs) are combinations of words which are lexically, syntactically, semantically or statistically idiosyncratic (Sag et al., 2002; Baldwin and Kim, 2009). Much research has been carried out on the extraction and identification of MWEs1 in English (Schone and Jurafsky, 2001; Pecina, 2008; Fazly et al., 2009) and other languages (Dias, 2003; Evert and Krenn, 2005; Salehi et al., 2012). However, considerably less work has addressed the task of predicting the meaning of MWEs, especially in non-English languages. As a step in this direction, the focus of this study is on predicting the compositionality of MWEs. An MWE is fully compositional if its meaning is predictable from its component words, and it is non-compositional (or idiomatic) if not. For example, stand up “rise to one’s feet” is composiIn this paper, we follow Baldwin and Kim (2009) in considering MWE “identification” to be a token-level disambiguation task, and MWE “extraction” to be a type-level lexicon induction task. tional, because its meaning is clear from the meaning of the components stand and up. However, the meaning of strike up “to start playing” is largely unpredictable from the component words strike and up. In this study, following McCarthy et al. (2003) and Reddy et al. (2011), we consider compositionality to be graded, and aim to predict the degree of compositionality. For example, in the dataset of Reddy et al. (2011), climate change is judged to be 99% compositional, while silver screen is 48% compositional and ivory tower is 9% compositional. Formally, we model compositionality prediction as a regression task. An explicit handling of MWEs has been shown to be useful in NLP applications (Ramisch, 2012). As an example, Carpuat and Diab (2010) proposed two strategies for integrating MWEs into statistical machine translation. They show that even a large scale bilingual corpus cannot capture all the necessary information to translate MWEs, and that in adding the facility to model the compositionality of MWEs into their system, they could improve translation quality. Acosta et al. (2011) showed that treating non-compositional MWEs as a single unit in information retrieval improves retrieval effectiveness. For example, while searching for documents related to ivory tower, we are almost certainly not interested in documents relating to elephant tusks. Our approach is to use a large-scale multi-way translation lexicon to source translations of MWEs and their component words, and then model the relative similarity between each of the component words and the MWE, using distributional similarity based on monolingual corpora for the source language and each of the target languages. Our hypothesis is that using distributional similarity in more than one language will improve the prediction of compositionality. Importantly, in order to make the method as language-independent and

csi international symposium on artificial intelligence and signal processing | 2012

A novel genetic-based instance selection method: Using a divide and conquer approach

Borhan Kazimipour; Bahar Salehi; Mansoor Zolghadri Jahromi

Nearest Neighbor (NN) classifier is a simple classifier which can be used in a variety of applications. However, this classifier is known to be vulnerable and very slow when dealing with redundant, irrelevant or noisy instances. To tackle this problem, we propose a novel method based on the combination of Genetic Algorithm and Divide and Conquer Algorithm to select the most relevant instances and hence improve classification accuracy and enhance time complexity and space requirement of NN. Our empirical studies confirm that this combination improves the results in all aspects and overcomes previously proposed methods.

north american chapter of the association for computational linguistics | 2015

The Impact of Multiword Expression Compositionality on Machine Translation Evaluation

Bahar Salehi; Nitika Mathur; Paul Cook; Timothy Baldwin

In this paper, we present the first attempt to integrate predicted compositionality scores of multiword expressions into automatic machine translation evaluation, in integrating compositionality scores for English noun compounds into the TESLA machine translation evaluation metric. The attempt is marginally successful, and we speculate on whether a larger-scale attempt is likely to have greater impact.

empirical methods in natural language processing | 2014

Detecting Non-compositional MWE Components using Wiktionary

Bahar Salehi; Paul Cook; Timothy Baldwin

We propose a simple unsupervised approach to detecting non-compositional components in multiword expressions based on Wiktionary. The approach makes use of the definitions, synonyms and translations in Wiktionary, and is applicable to any type of MWE in any language, assuming the MWE is contained in Wiktionary. Our experiments show that the proposed approach achieves higher F-score than state-of-the-art methods.

international conference on computational linguistics | 2012

Automatic identification of persian light verb constructions

Bahar Salehi; Narjes Askarian; Afsaneh Fazly

Multiword expressions pose a challenge to the development of large-scale, semantically-rich Natural Language Processing (NLP) systems. We use a bilingual parallel corpus for automatically extracting Light Verb Constructions (LVCs), a very common type of multiword expressions in many languages, including Persian. Using two classifiers, we investigate the usefulness of seven linguistically-informed features for automatically identifying Persian LVCs. To our knowledge, this is the first attempt at the automatic detection of a broad class of Persian LVCs. Results of our experiments show that the proposed features are reasonably successful at the task.

north american chapter of the association for computational linguistics | 2016

UniMelb at SemEval-2016 Task 3: Identifying Similar Questions by Combining a CNN with String Similarity Measures

Timothy Baldwin; Huizhi Liang; Bahar Salehi; Doris Hoogeveen; Yitong Li; Long Duong

This paper describes the results of the participation of The University of Melbourne in the community question-answering (CQA) task of SemEval 2016 (Task 3-B). We obtained a MAP score of 70.2% on the test set, by combining three classifiers: a NaiveBayes classifier and a support vector machine (SVM) each trained over lexical similarity features, and a convolutional neural network (CNN). The CNN uses word embeddings and machine translation evaluation scores as features.

International Journal of Computer and Electrical Engineering | 2012

Recognition of Farsi Handwritten Digits Using a Small Feature Set

G. Mirsharif; Mahsa Badami; Bahar Salehi; Zohreh Azimifar

Recognition of Farsi/Persian handwritten numeral characters has been the focus of study recently and has many applications such as postal code reading and check processing. One important step in any recognition system is feature extraction. We propose to use a small set including only 20 domain specific features which are simple to understand, easy to implement and extracted in a way similar to how humans discriminate digits. These features are extracted by simply counting the pixels which are confined in different curves of digits. Unlike the universal methods this way of feature extraction is related to the problem. Evaluating the proposed features indicates an achievement of 97% recognition rate on Hoda dataset. This method is scale and shift invariant and no pre-processing is needed.

international conference on the theory of information retrieval | 2018

Multitask Learning for Query Segmentation in Job Search

Bahar Salehi; Fei Liu; Timothy Baldwin; Wilson Wong

In this paper, we present the first attempt to use multitask learning for query segmentation. We use the semantic category of the words as an auxiliary task and show that segmentation improves when the model is also trained to predict the semantic category of the query terms, outperforming benchmark methods over a novel dataset from a popular job search engine. Our further experiments show that the task of modeling the query term semantics performs better as a standalone task, without adding segmentation as an auxiliary task.

joint conference on lexical and computational semantics | 2013