Massimiliano Ciaramita
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Massimiliano Ciaramita.
conference on computational natural language learning | 2009
Jan Hajiċ; Massimiliano Ciaramita; Richard Johansson; Daisuke Kawahara; Maria Antònia Martí; Lluís Màrquez; Adam Meyers; Joakim Nivre; Sebastian Padó; Jan Štėpánek; Pavel Straňák; Mihai Surdeanu; Nianwen Xue; Yi Zhang
For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple languages. This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task. In this paper, we define the shared task, describe how the data sets were created and show their quantitative properties, report the results and summarize the approaches of the participating systems.
european semantic web conference | 2006
Aldo Gangemi; Carola Catenacci; Massimiliano Ciaramita; Jos Lehmann
We present a comprehensive approach to ontology evaluation and validation, which have become a crucial problem for the development of semantic technologies. Existing evaluation methods are integrated into one sigle framework by means of a formal model. This model consists, firstly, of a meta-ontology called O2, that characterises ontologies as semiotic objects. Based on O2 and an analysis of existing methodologies, we identify three main types of measures for evaluation: structural measures, that are typical of ontologies represented as graphs; functional measures, that are related to the intended use of an ontology and of its components; and usability-profiling measures, that depend on the level of annotation of the considered ontology. The meta-ontology is then complemented with an ontology of ontology validation called oQual, which provides the means to devise the best set of criteria for choosing an ontology over others in the context of a given project. Finally, we provide a small example of how to apply oQual-derived criteria to a validation case.
international world wide web conferences | 2013
Marco Cornolti; Paolo Ferragina; Massimiliano Ciaramita
In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators.
Computational Linguistics | 2011
Mihai Surdeanu; Massimiliano Ciaramita; Hugo Zaragoza
This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks.
conference on information and knowledge management | 2007
Hugo Zaragoza; Henning Rode; Peter Mika; Jordi Atserias; Massimiliano Ciaramita; Giuseppe Attardi
We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine to compute entity relevance. We evaluate these approaches on the real task of ranking Wikipedia entities typed with a state-of-the-art named-entity tagger. Results show that both approaches can greatly increase the performance of methods based only on passage retrieval.
international world wide web conferences | 2008
Massimiliano Ciaramita; Vanessa Murdock; Vassilis Plachouras
Sponsored search is one of the enabling technologies for todays Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. We perform a large scale evaluation based on data from a commercial Web search engine. Results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions. We find that online multilayer perceptron learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and other learning models, providing a suitable framework for the sponsored search task.
empirical methods in natural language processing | 2003
Massimiliano Ciaramita; Mark Johnson
We present a new framework for classifying common nouns that extends named-entity classification. We used a fixed set of 26 semantic labels, which we called supersenses. These are the labels used by lexicographers developing WordNet. This framework has a number of practical advantages. We show how information contained in the dictionary can be used as additional training data that improves accuracy in learning new nouns. We also define a more realistic evaluation procedure than cross-validation.
web search and data mining | 2012
Ugo Scaiella; Paolo Ferragina; Andrea Marino; Massimiliano Ciaramita
Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.
international workshop on data mining and audience intelligence for advertising | 2007
Vanessa Murdock; Massimiliano Ciaramita; Vassilis Plachouras
Contextual advertising is a growing category of search advertising. It presents a particular challenge to ad placement systems because of the sparseness of the language of advertising. We present a system that is language independent and knowledge free based on SVM ranking. We evaluate it on a large number of advertisements appearing on real Web pages. Our contribution is two new classes of features of similarity between ads and Web pages based on machine translation technologies. We show that our features significantly improve performance over baseline techniques.
IEEE Intelligent Systems | 2008
Peter Mika; Massimiliano Ciaramita; Hugo Zaragoza; Jordi Atserias
The problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available.