Roberto Pasolini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Pasolini is active.

Explore More

Publication

Featured researches published by Roberto Pasolini.

international conference on data technologies and applications | 2015

A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

Within text categorization and other data mining tasks, the use of suitable methods for term weighting can bring a substantial boost in effectiveness. Several term weighting methods have been presented throughout literature, based on assumptions commonly derived from observation of distribution of words in documents. For example, the idf assumption states that words appearing in many documents are usually not as important as less frequent ones. Contrarily to tf.idf and other weighting methods derived from information retrieval, schemes proposed more recently are supervised, i.e. based on knownledge of membership of training documents to categories. We propose here a supervised variant of the tf.idf scheme, based on computing the usual idf factor without considering documents of the category to be recognized, so that importance of terms frequently appearing only within it is not underestimated. A further proposed variant is additionally based on relevance frequency, considering occurrences of words within the category itself. In extensive experiments on two recurring text collections with several unsupervised and supervised weighting schemes, we show that the ones we propose generally perform better than or comparably to other ones in terms of accuracy, using two different learning methods.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2015

Markov chain based method for in-domain and cross-domain sentiment classification

Giacomo Domeniconi; Gianluca Moro; Andrea Pagliarani; Roberto Pasolini

Sentiment classification of textual opinions in positive, negative or neutral polarity, is a method to understand people thoughts about products, services, persons, organisations, and so on. Interpreting and labelling opportunely text data polarity is a costly activity if performed by human experts. To cut this labelling cost, new cross domain approaches have been developed where the goal is to automatically classify the polarity of an unlabelled target text set of a given domain, for example movie reviews, from a labelled source text set of another domain, such as book reviews. Language heterogeneity between source and target domain is the trickiest issue in cross-domain setting so that a preliminary transfer learning phase is generally required. The best performing techniques addressing this point are generally complex and require onerous parameter tuning each time a new source-target couple is involved. This paper introduces a simpler method based on the Markov chain theory to accomplish both transfer learning and sentiment classification tasks. In fact, this straightforward technique requires a lower parameter calibration effort. Experiments on popular text sets show that our approach achieves performance comparable with other works.

international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2014

Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2014

Cross-domain Text Classification through Iterative Refining of Target Categories Representations

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

Cross-domain text classification deals with predicting topic labels for documents in a target domain by leveraging knowledge from pre-labeled documents in a source domain, with different terms or different distributions thereof. Methods exist to address this problem by re-weighting documents from the source domain to transfer them to the target one or by finding a common feature space for documents of both domains; they often require the combination of complex techniques, leading to a number of parameters which must be tuned for each dataset to yield optimal performances. We present a simpler method based on creating explicit representations of topic categories, which can be compared for similarity to the ones of documents. Categories representations are initially built from relevant source documents, then are iteratively refined by considering the most similar target documents, with relatedness being measured by a simple regression model based on cosine similarity, built once at the begin. This expectedly leads to obtain accurate representations for categories in the target domain, used to classify documents therein. Experiments on common benchmark text collections show that this approach obtains results better or comparable to other methods, obtained with fixed empirical values for its few parameters.

international conference on data technologies and applications | 2015

A Comparison of Term Weighting Schemes for Text Classification and Sentiment Analysis with a Supervised Variant of tf.idf

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

In text analysis tasks like text classification and sentiment analysis, the careful choice of term weighting schemes can have an important impact on the effectiveness. Classic unsupervised schemes are based solely on the distribution of terms across documents, while newer supervised ones leverage the knowledge of membership of training documents to categories; these latter ones are often specifically tailored for either topic or sentiment classification. We propose here a supervised variant of the well-known tf.idf scheme, where the idf factor is computed without considering documents within the category under analysis, so that terms frequently appearing only within it are not penalized. The importance of these terms is further boosted in a second variant inspired by relevance frequency. We performed extensive experiments to compare these novel schemes to known ones, observing top performances in text categorization by topic and satisfactory results in sentiment classification.

Computers & Security | 2015

Decentralized detection of network attacks through P2P data clustering of SNMP data

Walter Cerroni; Gianluca Moro; Roberto Pasolini; Marco Ramilli

Abstract The goal of Network Intrusion Detection Systems (NIDSs) is to protect against attacks by inspecting network traffic packets, for instance, looking for anomalies and signatures of known attacks. This paper illustrates an approach to attack detection that analyzes just the standard statistics automatically generated by the Simple Network Management Protocol (SNMP) through unsupervised distributed data mining algorithms. We describe the design of a decentralized system composed of a peer-to-peer network of monitoring stations: each of them continuously gathers SNMP statistical observations about the network traffic and runs a distributed data clustering algorithm in cooperation with other stations. This progressively leads to the construction of a traffic model capable to detect undergoing attacks on later observations, including potentially previously unknown attacks. To estimate the accuracy of the described system, we performed an extensive number of distributed data clustering processing on data sets of SNMP observations generated from real traffic.

international conference on pattern recognition applications and methods | 2016

Job Recommendation from Semantic Similarity of LinkedIn Users' Skills

Giacomo Domeniconi; Gianluca Moro; Andrea Pagliarani; Karin Pasini; Roberto Pasolini

Until recently job seeking has been a tricky, tedious and time consuming process, because people looking for a new position had to collect information from many different sources. Job recommendation systems have been proposed in order to automate and simplify this task, also increasing its effectiveness. However, current approaches rely on scarce manually collected data that often do not completely reveal people skills. Our work aims to find out relationships between jobs and people skills making use of data from LinkedIn usersâ?? public profiles. Semantic associations arise by applying Latent Semantic Analysis (LSA). We use the mined semantics to obtain a hierarchical clustering of job positions and to build a job recommendation system. The outcome proves the effectiveness of our method in recommending job positions. Anyway, we argue that our approach is definitely general, because the extracted semantics could be worthy not only for job recommendation systems but also for recruiting systems. Furthermore, we point out that both the hierarchical clustering and the recommendation system do not require parameters to be tuned.

international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2015

Cross-Domain Sentiment Classification via Polarity-Driven State Transitions in a Markov Model

Giacomo Domeniconi; Gianluca Moro; Andrea Pagliarani; Roberto Pasolini

Nowadays understanding people’s opinions is the way to success, whatever the goal. Sentiment classification automates this task, assigning a positive, negative or neutral polarity to free text concerning services, products, TV programs, and so on. Learning accurate models requires a considerable effort from human experts that have to properly label text data. To reduce this burden, cross-domain approaches are advisable in real cases and transfer learning between source and target domains is usually demanded due to language heterogeneity. This paper introduces some variants of our previous work [1], where both transfer learning and sentiment classification are performed by means of a Markov model. While document splitting into sentences does not perform well on common benchmark, using polarity-bearing terms to drive the classification process shows encouraging results, given that our Markov model only considers single terms without further context information.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2018

Cross-domain & In-domain Sentiment Analysis with Memory-based Deep Neural Networks.

Gianluca Moro; Andrea Pagliarani; Roberto Pasolini; Claudio Sartori

Cross-domain sentiment classifiers aim to predict the polarity, namely the sentiment orientation of target text documents, by reusing a knowledge model learned from a different source domain. Distinct domains are typically heterogeneous in language, so that transfer learning techniques are advisable to support knowledge transfer from source to target. Distributed word representations are able to capture hidden word relationships without supervision, even across domains. Deep neural networks with memory (MemDNN) have recently achieved the state-of-the-art performance in several NLP tasks, including cross-domain sentiment classification of large-scale data. The contribution of this work is the massive experimentations of novel outstanding MemDNN architectures, such as Gated Recurrent Unit (GRU) and Differentiable Neural Computer (DNC) both in cross-domain and in-domain sentiment classification by using the GloVe word embeddings. As far as we know, only GRU neural networks have been applied in cross-domain sentiment classification. Sentiment classifiers based on these deep learning architectures are also assessed from the viewpoint of scalability and accuracy by gradually increasing the training set size, and showing also the effect of fine-tuning, an explicit transfer learning mechanism, on cross-domain tasks. This work shows that MemDNN based classifiers improve the state-of-the-art on Amazon Reviews corpus with reference to document-level cross-domain sentiment classification. On the same corpus, DNC outperforms previous approaches in the analysis of a very large in-domain configuration in both binary and fine-grained document sentiment classification. Finally, DNC achieves accuracy comparable with the state-of-the-art approaches on the Stanford Sentiment Treebank dataset in both binary and fine-grained single-sentence sentiment classification.

international conference on knowledge discovery and information retrieval | 2017

Personalized Web Search via Query Expansion based on User's Local Hierarchically-Organized Files.

Gianluca Moro; Roberto Pasolini; Claudio Sartori

Users of Web search engines generally express information needs with short and ambiguous queries, leading to irrelevant results. Personalized search methods improve users’ experience by automatically reformulating queries before sending them to the search engine or rearranging received results, according to their specific interests. A user profile is often built from previous queries, clicked results or in general from the user’s browsing history; different topics must be distinguished in order to obtain an accurate profile. It is quite common that a set of user files, locally stored in sub-directory, are organized by the user into a coherent taxonomy corresponding to own topics of interest, but only a few methods leverage on this potentially useful source of knowledge. We propose a novel method where a user profile is built from those files, specifically considering their consistent arrangement in directories. A bag of keywords is extracted for each directory from text documents within it. We can infer the topic of each query and expand it by adding the corresponding keywords, in order to obtain a more targeted formulation. Experiments are carried out using benchmark data through a repeatable systematic process, in order to evaluate objectively how much our method can improve relevance of query results when applied upon a third-party search engine.

Explore More