Gerard de Melo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gerard de Melo is active.

Explore More

Publication

Featured researches published by Gerard de Melo.

conference on information and knowledge management | 2009

Towards a universal wordnet by learning from combined evidence

Gerard de Melo; Gerhard Weikum

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.

conference on information and knowledge management | 2010

MENTA: inducing multilingual taxonomies from wikipedia

Gerard de Melo; Gerhard Weikum

In recent years, a number of projects have turned to Wikipedia to establish large-scale taxonomies that describe orders of magnitude more entities than traditional manually built knowledge bases. So far, however, the multilingual nature of Wikipedia has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is presumably the largest multilingual lexical knowledge base currently available.

web search and data mining | 2014

WebChild: harvesting and organizing commonsense knowledge from the web

Niket Tandon; Gerard de Melo; Fabian M. Suchanek; Gerhard Weikum

This paper presents a method for automatically constructing a large commonsense knowledge base, called WebChild, from Web contents. WebChild contains triples that connect nouns with adjectives via fine-grained relations like hasShape, hasTaste, evokesEmotion, etc. The arguments of these assertions, nouns and adjectives, are disambiguated by mapping them onto their proper WordNet senses. Our method is based on semi-supervised Label Propagation over graphs of noisy candidate assertions. We automatically derive seeds from WordNet and by pattern matching from Web text collections. The Label Propagation algorithm provides us with domain sets and range sets for 19 different relations, and with confidence-ranked assertions between WordNet senses. Large-scale experiments demonstrate the high accuracy (more than 80 percent) and coverage (more than four million fine grained disambiguated assertions) of WebChild.

meeting of the association for computational linguistics | 2016

Relation Classification via Multi-Level Attention CNNs

Linlin Wang; Zhu Cao; Gerard de Melo; Zhiyuan Liu

Relation classification is a crucial ingredient in numerous information extraction systems seeking to mine structured facts from text. We propose a novel convolutional neural network architecture for this task, relying on two levels of attention in order to better discern patterns in heterogeneous contexts. This architecture enables endto-end learning from task-specific labeled data, forgoing the need for external knowledge such as explicit dependency structures. Experiments show that our model outperforms previous state-of-the-art methods, including those relying on much richer forms of prior knowledge.

european semantic web conference | 2015

FrameBase: Representing N-Ary Relations Using Semantic Frames

Jacobo Rouces; Gerard de Melo; Katja Hose

Large-scale knowledge graphs such as those in the Linked Data cloud are typically represented as subject-predicate-object triples. However, many facts about the world involve more than two entities. While n-ary relations can be converted to triples in a number of ways, unfortunately, the structurally different choices made in different knowledge sources significantly impede our ability to connect them. They also make it impossible to query the data concisely and without prior knowledge of each individual source. We present FrameBase, a wide-coverage knowledge-base schema that uses linguistic frames to seamlessly represent and query n-ary relations from other knowledge bases, at different levels of granularity connected by logical entailment. It also opens possibilities to draw on natural language processing techniques for querying and data mining.

european conference on information retrieval | 2007

Multilingual text classification using ontologies

Gerard de Melo; Stefan Siersdorfer

In this paper, we investigate strategies for automatically classifying documents in different languages thematically, geographically or according to other criteria. A novel linguistically motivated text representation scheme is presented that can be used with machine learning algorithms in order to learn classifications from pre-classified examples and then automatically classify documents that might be provided in entirely different languages. Our approach makes use of ontologies and lexical resources but goes beyond a simple mapping from terms to concepts by fully exploiting the external knowledge manifested in such resources and mapping to entire regions of concepts. For this, a graph traversal algorithm is used to explore related concepts that might be relevant. Extensive testing has shown that our methods lead to significant improvements compared to existing approaches.

Sprachwissenschaft | 2015

Lexvo.org:Language-Related Information for the Linguistic Linked Data Cloud

Gerard de Melo

Lexvo.org brings information about languages, words, and other linguistic entities to the Web of Linked Data. It defines URIs for terms, languages, scripts, and characters, which are not only highly interconnected but also linked to a variety of resources on the Web. Additionally, new datasets are being published to contribute to the emerging Linked Data Cloud of Language-Related information.

meeting of the association for computational linguistics | 2014

Structured Learning for Taxonomy Induction with Belief Propagation

Mohit Bansal; David Burkett; Gerard de Melo; Daniel Klein

We present a structured learning approach to inducing hypernym taxonomies using a probabilistic graphical model formulation. Our model incorporates heterogeneous relational evidence about both hypernymy and siblinghood, captured by semantic features based on patterns and statistics from Web n-grams and Wikipedia abstracts. For efficient inference over taxonomy structures, we use loopy belief propagation along with a directed spanning tree algorithm for the core hypernymy factor. To train the system, we extract sub-structures of WordNet and discriminatively learn to reproduce them, using adaptive subgradient stochastic optimization. On the task of reproducing sub-hierarchies of WordNet, our approach achieves a 51% error reduction over a chance baseline, including a 15% error reduction due to the non-hypernym-factored sibling features. On a comparison setup, we find up to 29% relative error reduction over previous work on ancestor F1.

conference on information and knowledge management | 2015

Knowlywood: Mining Activity Knowledge From Hollywood Narratives

Niket Tandon; Gerard de Melo; Abir De; Gerhard Weikum

Despite the success of large knowledge bases, one kind of knowledge that has not received attention so far is that of human activities. An example of such an activity is proposing to someone (to get married). For the computer, knowing that this involves two adults, often but not necessarily a woman and a man, that it often takes place in some romantic location, that it typically involves flowers or jewelry, and that it is usually followed by kissing, is a valuable asset for tasks like natural language dialog, scene understanding, or video search. This corresponds to the challenging task of acquiring semantic frames that capture human activities, their participating agents, and their typical spatio-temporal contexts. This paper presents a novel approach that taps into movie scripts and other narrative texts. We develop a pipeline for semantic parsing and knowledge distillation, to systematically compile semantically refined activity frames. The resulting knowledge base contains hundreds of thousands of activity frames, mined from about two million scenes of movies, TV series, and novels. A manual assessment study, with extensive sampling and statistical significance tests, shows that the frames and their attribute values have an accuracy of at least 80 percent. We also demonstrate the usefulness of activity knowledge by the extrinsic use case of movie scene search.

international joint conference on natural language processing | 2015

Sentiment-Aspect Extraction based on Restricted Boltzmann Machines

Linlin Wang; Kang Liu; Zhu Cao; Jun Zhao; Gerard de Melo

Aspect extraction and sentiment analysis of reviews are both important tasks in opinion mining. We propose a novel sentiment and aspect extraction model based on Restricted Boltzmann Machines to jointly address these two tasks in an unsupervised setting. This model reflects the generation process of reviews by introducing a heterogeneous structure into the hidden layer and incorporating informative priors. Experiments show that our model outperforms previous state-of-the-art methods.

Explore More