Massimiliano Ciaramita

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Massimiliano Ciaramita is active.

Explore More

Publication

Featured researches published by Massimiliano Ciaramita.

conference on computational natural language learning | 2009

The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages

Jan Hajiċ; Massimiliano Ciaramita; Richard Johansson; Daisuke Kawahara; Maria Antònia Martí; Lluís Màrquez; Adam Meyers; Joakim Nivre; Sebastian Padó; Jan Štėpánek; Pavel Straňák; Mihai Surdeanu; Nianwen Xue; Yi Zhang

For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple languages. This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task. In this paper, we define the shared task, describe how the data sets were created and show their quantitative properties, report the results and summarize the approaches of the participating systems.

european semantic web conference | 2006

Modelling ontology evaluation and validation

Aldo Gangemi; Carola Catenacci; Massimiliano Ciaramita; Jos Lehmann

We present a comprehensive approach to ontology evaluation and validation, which have become a crucial problem for the development of semantic technologies. Existing evaluation methods are integrated into one sigle framework by means of a formal model. This model consists, firstly, of a meta-ontology called O2, that characterises ontologies as semiotic objects. Based on O2 and an analysis of existing methodologies, we identify three main types of measures for evaluation: structural measures, that are typical of ontologies represented as graphs; functional measures, that are related to the intended use of an ontology and of its components; and usability-profiling measures, that depend on the level of annotation of the considered ontology. The meta-ontology is then complemented with an ontology of ontology validation called oQual, which provides the means to devise the best set of criteria for choosing an ontology over others in the context of a given project. Finally, we provide a small example of how to apply oQual-derived criteria to a validation case.

international world wide web conferences | 2013

A framework for benchmarking entity-annotation systems

Marco Cornolti; Paolo Ferragina; Massimiliano Ciaramita

In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators.

Computational Linguistics | 2011

Learning to rank answers to non-factoid questions from web collections

Mihai Surdeanu; Massimiliano Ciaramita; Hugo Zaragoza

This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks.

conference on information and knowledge management | 2007

Ranking very many typed entities on wikipedia

Hugo Zaragoza; Henning Rode; Peter Mika; Jordi Atserias; Massimiliano Ciaramita; Giuseppe Attardi

We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine to compute entity relevance. We evaluate these approaches on the real task of ranking Wikipedia entities typed with a state-of-the-art named-entity tagger. Results show that both approaches can greatly increase the performance of methods based only on passage retrieval.

international world wide web conferences | 2008

Online learning from click data for sponsored search

Massimiliano Ciaramita; Vanessa Murdock; Vassilis Plachouras

Sponsored search is one of the enabling technologies for todays Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. We perform a large scale evaluation based on data from a commercial Web search engine. Results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions. We find that online multilayer perceptron learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and other learning models, providing a suitable framework for the sponsored search task.

empirical methods in natural language processing | 2003

Supersense tagging of unknown nouns in WordNet

Massimiliano Ciaramita; Mark Johnson

We present a new framework for classifying common nouns that extends named-entity classification. We used a fixed set of 26 semantic labels, which we called supersenses. These are the labels used by lexicographers developing WordNet. This framework has a number of practical advantages. We show how information contained in the dictionary can be used as additional training data that improves accuracy in learning new nouns. We also define a more realistic evaluation procedure than cross-validation.

web search and data mining | 2012

Topical clustering of search results

Ugo Scaiella; Paolo Ferragina; Andrea Marino; Massimiliano Ciaramita

Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.

international workshop on data mining and audience intelligence for advertising | 2007

A noisy-channel approach to contextual advertising

Vanessa Murdock; Massimiliano Ciaramita; Vassilis Plachouras

Contextual advertising is a growing category of search advertising. It presents a particular challenge to ad placement systems because of the sparseness of the language of advertising. We present a system that is language independent and knowledge free based on SVM ranking. We evaluate it on a large number of advertisements appearing on real Web pages. Our contribution is two new classes of features of similarity between ads and Web pages based on machine translation technologies. We show that our features significantly improve performance over baseline techniques.

IEEE Intelligent Systems | 2008

Learning to Tag and Tagging to Learn: A Case Study on Wikipedia

Peter Mika; Massimiliano Ciaramita; Hugo Zaragoza; Jordi Atserias

The problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available.

Explore More