Is this you? Create Your Porfile

Massih-Reza Amini

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Massih-Reza Amini is active.

Explore More

Publication

Featured researches published by Massih-Reza Amini.

Sigkdd Explorations | 2014

On power law distributions in large-scale taxonomies

Rohit Babbar; Cornelia Metzig; Ioannis Partalas; Eric Gaussier; Massih-Reza Amini

In many of the large-scale physical and social complex systems phenomena fat-tailed distributions occur, for which different generating mechanisms have been proposed. In this paper, we study models of generating power law distributions in the evolution of large-scale taxonomies such as Open Directory Project, which consist of websites assigned to one of tens of thousands of categories. The categories in such taxonomies are arranged in tree or DAG structured configurations having parent-child relations among them. We first quantitatively analyse the formation process of such taxonomies, which leads to power law distribution as the stationary distributions. In the context of designing classifiers for large-scale taxonomies, which automatically assign unseen documents to leaf-level categories, we highlight how the fat-tailed nature of these distributions can be leveraged to analytically study the space complexity of such classifiers. Empirical evaluation of the space complexity on publicly available datasets demonstrates the applicability of our approach.

international conference on neural information processing | 2013

Maximum-Margin Framework for Training Data Synchronization in Large-Scale Hierarchical Classification

Rohit Babbar; Ioannis Partalas; Eric Gaussier; Massih-Reza Amini

In the context of supervised learning, the training data for large-scale hierarchical classification consist of (i) a set of input-output pairs, and (ii) a hierarchy structure defining parent-child relation among class labels. It is often the case that the hierarchy structure given a-priori is not optimal for achieving high classification accuracy. This is especially true for web-taxonomies such as Yahoo! directory which consist of tens of thousand of classes. Furthermore, an important goal of hierarchy design is to render better navigability and browsing. In this work, we propose a maximum-margin framework for automatically adapting the given hierarchy by using the set of input-output pairs to yield a new hierarchy. The proposed method is not only theoretically justified but also provides a more principled approach for hierarchy flattening techniques proposed earlier, which are ad-hoc and empirical in nature. The empirical results on publicly available large-scale datasets demonstrate that classification with new hierarchy leads to better or comparable generalization performance than the hierarchy flattening techniques.

Archive | 2015

Learning with Partially Labeled and Interdependent Data

Massih-Reza Amini; Nicolas Usunier

This book develops two key machine learning principles: the semi-supervised paradigm and learning with interdependent data. It reveals new applications, primarily web related, that transgress the classical machine learning framework through learning with interdependent data. The book traces how the semi-supervised paradigm and the learning to rank paradigm emerged from new web applications, leading to a massive production of heterogeneous textual data. It explains how semi-supervised learning techniques are widely used, but only allow a limited analysis of the information content and thus do not meet the demands of many web-related tasks. Later chapters deal with the development of learning methods for ranking entities in a large collection with respect to precise information needed. In some cases, learning a ranking function can be reduced to learning a classification function over the pairs of examples. The book proves that this task can be efficiently tackled in a new framework: learning with interdependent data. Researchers and professionals in machine learning will find these new perspectives and solutions valuable. Learning with Partially Labeled and Interdependent Data is also useful for advanced-level students of computer science, particularly those focused on statistics and learning.

acm multimedia | 2013

Multiview semi-supervised ranking for automatic image annotation

Ali Fakeri-Tabrizi; Massih-Reza Amini; Patrick Gallinari

Most photo sharing sites give their users the opportunity to manually label images. The labels collected that way are usually very incomplete due to the size of the image collections: most images are not labeled according to all the categories they belong to, and, conversely, many class have relatively few representative examples. Automated image systems that can deal with small amounts of labeled examples and unbalanced classes are thus necessary to better organize and annotate images. In this work, we propose a multiview semi-supervised bipartite ranking model which allows to leverage the information contained in unlabeled sets of images in order to improve the prediction performance, using multiple descriptions, or views of images. For each topic class, our approach first learns as many view-specific rankers as available views using the labeled data only. These rankers are then improved iteratively by adding pseudo-labeled pairs of examples on which all view-specific rankers agree over the ranking of examples within these pairs. We report on experiments carried out on the NUS-WIDE dataset, which show that the multiview ranking process improves predictive performances when a small number of labeled examples is available specially for unbalanced classes. We show also that our approach achieves significant improvements over a state-of-the art semi-supervised multiview classification model.

international acm sigir conference on research and development in information retrieval | 2016

Health Monitoring on Social Media over Time

Sumit Sidana; Shashwat Mishra; Sihem Amer-Yahia; Marianne Clausel; Massih-Reza Amini

Social media has become a major source for analyzing all aspects of daily life. Thanks to dedicated latent topic analysis methods such as the Ailment Topic Aspect Model (ATAM), public health can now be observed on Twitter. In this work, we are interested in using social media to monitor people’s health over time. The use of tweets has several benefits including instantaneous data availability at virtually no cost. Early monitoring of health data is complementary to post-factum studies and enables a range of applications such as measuring behavioral risk factors and triggering health campaigns. We formulate two problems: health transition detection and health transition prediction. We first propose the Temporal Ailment Topic Aspect Model (TM–ATAM), a new latent model dedicated to solving the first problem by capturing transitions that involve health-related topics. TM–ATAM is a non-obvious extension to ATAM that was designed to extract health-related topics. It learns health-related topic transitions by minimizing the prediction error on topic distributions between consecutive posts at different time and geographic granularities. To solve the second problem, we develop T–ATAM, a Temporal Ailment Topic Aspect Model where time is treated as a random variable natively inside ATAM. Our experiments on an 8-month corpus of tweets show that TM–ATAM outperforms TM–LDA in estimating health-related transitions from tweets for different geographic populations. We examine the ability of TM–ATAM to detect transitions due to climate conditions in different geographic regions. We then show how T–ATAM can be used to predict the most important transition and additionally compare T–ATAM with CDC (Center for Disease Control) data and Google Flu Trends.

international acm sigir conference on research and development in information retrieval | 2014

Re-ranking approach to classification in large-scale power-law distributed category systems

Rohit Babbar; Ioannis Partalas; Eric Gaussier; Massih-Reza Amini

For large-scale category systems, such as Directory Mozilla, which consist of tens of thousand categories, it has been empirically verified in earlier studies that the distribution of documents among categories can be modeled as a power-law distribution. It implies that a significant fraction of categories, referred to as rare categories, have very few documents assigned to them. This characteristic of the data makes it harder for learning algorithms to learn effective decision boundaries which can correctly detect such categories in the test set. In this work, we exploit the distribution of documents among categories to (i) derive an upper bound on the accuracy of any classifier, and (ii) propose a ranking-based algorithm which aims to maximize this upper bound. The empirical evaluation on publicly available large-scale datasets demonstrate that the proposed method not only achieves higher accuracy but also much higher coverage of rare categories as compared to state-of-the-art methods.

north american chapter of the association for computational linguistics | 2016

TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification

Georgios Balikas; Massih-Reza Amini

This paper describes the participation of the team TwiSE in the SemEval 2016 challenge. Specifically, we participated in Task 4, namely Sentiment Analysis in Twitter for which we implemented sentiment classification systems for subtasks A, B, C and D. Our approach consists of two steps. In the first step, we generate and validate diverse feature sets for twitter sentiment evaluation, inspired by the work of participants of previous editions of such challenges. In the second step, we focus on the optimization of the evaluation measures of the different subtasks. To this end, we examine different learning strategies by validating them on the data provided by the task organisers. For our final submissions we used an ensemble learning approach (stacked generalization) for Subtask A and single linear models for the rest of the subtasks. In the official leaderboard we were ranked 9/35, 8/19, 1/11 and 2/14 for subtasks A, B, C and D respectively.footnote{We make the code available for research purposes at url{this https URL_Sentiment_Evaluation}.}

european conference on information retrieval | 2014

Exploring the Space of IR Functions

Parantapa Goswami; Simon Moura; Eric Gaussier; Massih-Reza Amini; Francis Maes

In this paper we propose an approach to discover functions for IR ranking from a space of simple closed-form mathematical functions. In general, all IR ranking models are based on two basic variables, namely, term frequency and document frequency. Here a grammar for generating all possible functions is defined which consists of the two above said variables and basic mathematical operations - addition, subtraction, multiplication, division, logarithm, exponential and square root. The large set of functions generated by this grammar is filtered by checking mathematical feasibility and satisfiability to heuristic constraints on IR scoring functions proposed by the community. Obtained candidate functions are tested on various standard IR collections and several simple but highly efficient scoring functions are identified. We show that these newly discovered functions are outperforming other state-of-the-art IR scoring models through extensive experimentation on several IR collections. We also compare the performance of functions satisfying IR constraints to those which do not, and show that the former set of functions clearly outperforms the latter one.

conference on information and knowledge management | 2013

Transferring knowledge with source selection to learn IR functions on unlabeled collections

Parantapa Goswami; Massih-Reza Amini; Eric Gaussier

We investigate the problem of learning an IR function on a collection without relevance judgements (called target collection) by transferring knowledge from a selected source collection with relevance judgements. To do so, we first construct, for each query in the target collection, relative relevance judgment pairs using information from the source collection closest to the query (selection and transfer steps), and then learn an IR function from the obtained pairs in the target collection (self-learning step). For the transfer step, the relevance information in the source collection is summarized as a grid that provides, for each term frequency and document frequency values of a word in a document, an empirical estimate of the relevance of the document. The self-learning step iteratively assigns pairwise preferences to documents in the target collection using the scores of the former learned function. We show the effectiveness of our approach through a series of extensive experiments on CLEF and several collections from TREC used either as target or source datasets. Our experiments show the importance of selecting the source collection prior to transfer information to the target collection, and demonstrate that the proposed approach yields results consistently and significantly above state-of-the-art IR functions.

Archive | 2015

Semi-Supervised Learning

Massih-Reza Amini; Nicolas Usunier

In this chapter, we give an overview of different approaches developed in semi-supervised learning, as well as different assumptions leading to these methods. We particularly consider the margin as an indicator of confidence which constitutes the working hypothesis of algorithms that search the decision boundary on low density regions. Following this assumption, we present a bound over the error probability of the voted classifier on the examples for whose margins are above a fixed threshold. As an application, we detail a self-learning algorithm which iteratively assigns pseudo-labels to the set of unlabeled training examples that have their margin above a threshold obtained from this bound and also present a multiview extension of this method.

Explore More