Giacomo Domeniconi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giacomo Domeniconi is active.

Explore More

Publication

Featured researches published by Giacomo Domeniconi.

international conference on data technologies and applications | 2015

A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

Within text categorization and other data mining tasks, the use of suitable methods for term weighting can bring a substantial boost in effectiveness. Several term weighting methods have been presented throughout literature, based on assumptions commonly derived from observation of distribution of words in documents. For example, the idf assumption states that words appearing in many documents are usually not as important as less frequent ones. Contrarily to tf.idf and other weighting methods derived from information retrieval, schemes proposed more recently are supervised, i.e. based on knownledge of membership of training documents to categories. We propose here a supervised variant of the tf.idf scheme, based on computing the usual idf factor without considering documents of the category to be recognized, so that importance of terms frequently appearing only within it is not underestimated. A further proposed variant is additionally based on relevance frequency, considering occurrences of words within the category itself. In extensive experiments on two recurring text collections with several unsupervised and supervised weighting schemes, we show that the ones we propose generally perform better than or comparably to other ones in terms of accuracy, using two different learning methods.

BMC Bioinformatics | 2015

GOTA: GO term annotation of biomedical literature

Pietro Di Lena; Giacomo Domeniconi; Luciano Margara; Gianluca Moro

BackgroundFunctional annotation of genes and gene products is a major challenge in the post-genomic era. Nowadays, gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, therefore there is an increasing interest in automated tools that can assist human experts.ResultsHere we introduce GOTA, a GO term annotator for biomedical literature. The proposed approach makes use only of information that is readily available from public repositories and it is easily expandable to handle novel sources of information. We assess the classification capabilities of GOTA on a large benchmark set of publications. The overall performances are encouraging in comparison to the state of the art in multi-label classification over large taxonomies. Furthermore, the experimental tests provide some interesting insights into the potential improvement of automated annotation tools.ConclusionsGOTA implements a flexible and expandable model for GO annotation of biomedical literature. The current version of the GOTA tool is freely available at http://gota.apice.unibo.it.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2015

Markov chain based method for in-domain and cross-domain sentiment classification

Giacomo Domeniconi; Gianluca Moro; Andrea Pagliarani; Roberto Pasolini

Sentiment classification of textual opinions in positive, negative or neutral polarity, is a method to understand people thoughts about products, services, persons, organisations, and so on. Interpreting and labelling opportunely text data polarity is a costly activity if performed by human experts. To cut this labelling cost, new cross domain approaches have been developed where the goal is to automatically classify the polarity of an unlabelled target text set of a given domain, for example movie reviews, from a labelled source text set of another domain, such as book reviews. Language heterogeneity between source and target domain is the trickiest issue in cross-domain setting so that a preliminary transfer learning phase is generally required. The best performing techniques addressing this point are generally complex and require onerous parameter tuning each time a new source-target couple is involved. This paper introduces a simpler method based on the Markov chain theory to accomplish both transfer learning and sentiment classification tasks. In fact, this straightforward technique requires a lower parameter calibration effort. Experiments on popular text sets show that our approach achieves performance comparable with other works.

international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2014

Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2014

Discovering New Gene Functionalities from Random Perturbations of Known Gene Ontological Annotations

Giacomo Domeniconi; Marco Masseroli; Gianluca Moro; Pietro Pinoli

Genomic annotations describing functional features of genes and proteins through controlled terminologies and ontologies are extremely valuable, especially for computational analyses aimed at inferring new biomedical knowledge. Thanks to the biology revolution led by the introduction of the novel DNA sequencing technologies, several repositories of such annotations have becoming available in the last decade; among them, the ones including Gene Ontology annotations are the most relevant. Nevertheless, the available set of genomic annotations is incomplete, and only some of the available annotations represent highly reliable human curated information. In this paper we propose a novel representation of the annotation discovery problem, so as to enable applying supervised algorithms to predict Gene Ontology annotations of different organism genes. In order to use supervised algorithms despite labeled data to train the prediction model are not available, we propose a random perturbation method of the training set, which creates a new annotation matrix to be used to train the model to recognize new annotations. We tested the effectiveness of our approach on nine Gene Ontology annotation datasets. Obtained results demonstrated that our technique is able to improve novel annotation predictions with respect to state of the art unsupervised methods.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2014

Cross-domain Text Classification through Iterative Refining of Target Categories Representations

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

Cross-domain text classification deals with predicting topic labels for documents in a target domain by leveraging knowledge from pre-labeled documents in a source domain, with different terms or different distributions thereof. Methods exist to address this problem by re-weighting documents from the source domain to transfer them to the target one or by finding a common feature space for documents of both domains; they often require the combination of complex techniques, leading to a number of parameters which must be tuned for each dataset to yield optimal performances. We present a simpler method based on creating explicit representations of topic categories, which can be compared for similarity to the ones of documents. Categories representations are initially built from relevant source documents, then are iteratively refined by considering the most similar target documents, with relatedness being measured by a simple regression model based on cosine similarity, built once at the begin. This expectedly leads to obtain accurate representations for categories in the target domain, used to classify documents therein. Experiments on common benchmark text collections show that this approach obtains results better or comparable to other methods, obtained with fixed empirical values for its few parameters.

international conference on data technologies and applications | 2015

A Comparison of Term Weighting Schemes for Text Classification and Sentiment Analysis with a Supervised Variant of tf.idf

Giacomo Domeniconi; Gianluca Moro; Roberto Pasolini; Claudio Sartori

In text analysis tasks like text classification and sentiment analysis, the careful choice of term weighting schemes can have an important impact on the effectiveness. Classic unsupervised schemes are based solely on the distribution of terms across documents, while newer supervised ones leverage the knowledge of membership of training documents to categories; these latter ones are often specifically tailored for either topic or sentiment classification. We propose here a supervised variant of the well-known tf.idf scheme, where the idf factor is computed without considering documents within the category under analysis, so that terms frequently appearing only within it are not penalized. The importance of these terms is further boosted in a second variant inspired by relevance frequency. We performed extensive experiments to compare these novel schemes to known ones, observing top performances in text categorization by topic and satisfactory results in sentiment classification.

international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2014

Random Perturbations of Term Weighted Gene Ontology Annotations for Discovering Gene Unknown Functionalities

Giacomo Domeniconi; Marco Masseroli; Gianluca Moro; Pietro Pinoli

Computational analyses for biomedical knowledge discovery greatly benefit from the availability of the description of gene and protein functional features expressed through controlled terminologies and ontologies, i.e. of their controlled annotations. In the last years, several databases of such annotations have become available; yet, these annotations are incomplete and only some of them represent highly reliable human curated information. To predict and discover unknown or missing annotations existing approaches use unsupervised learning algorithms. We propose a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions. This method, which we also extend from our preceding work with data weighting techniques, is based on the generation of artificial labeled training sets through random perturbations of original data. We tested it on nine Gene Ontology annotation datasets; obtained results demonstrate that our approach achieves good effectiveness in novel annotation prediction, outperforming state of the art unsupervised methods.

Computer Methods and Programs in Biomedicine | 2016

Cross-organism learning method to discover new gene functionalities

Giacomo Domeniconi; Marco Masseroli; Gianluca Moro; Pietro Pinoli

BACKGROUND Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. METHODS Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. RESULTS We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted.

international conference on pattern recognition applications and methods | 2016

Job Recommendation from Semantic Similarity of LinkedIn Users' Skills

Giacomo Domeniconi; Gianluca Moro; Andrea Pagliarani; Karin Pasini; Roberto Pasolini

Until recently job seeking has been a tricky, tedious and time consuming process, because people looking for a new position had to collect information from many different sources. Job recommendation systems have been proposed in order to automate and simplify this task, also increasing its effectiveness. However, current approaches rely on scarce manually collected data that often do not completely reveal people skills. Our work aims to find out relationships between jobs and people skills making use of data from LinkedIn usersâ?? public profiles. Semantic associations arise by applying Latent Semantic Analysis (LSA). We use the mined semantics to obtain a hierarchical clustering of job positions and to build a job recommendation system. The outcome proves the effectiveness of our method in recommending job positions. Anyway, we argue that our approach is definitely general, because the extracted semantics could be worthy not only for job recommendation systems but also for recruiting systems. Furthermore, we point out that both the hierarchical clustering and the recommendation system do not require parameters to be tuned.

Explore More