Filipe de Sá Mesquita

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Filipe de Sá Mesquita is active.

Explore More

Publication

Featured researches published by Filipe de Sá Mesquita.

acm/ieee joint conference on digital libraries | 2007

FLUX-CIM: flexible unsupervised extraction of citation metadata

Eli Cortez; Altigran Soares da Silva; Marcos André Gonçalves; Filipe de Sá Mesquita; Edleno Silva de Moura

In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.

Information Processing and Management | 2007

LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces

Filipe de Sá Mesquita; Altigran Soares da Silva; Edleno Silva de Moura; Pável Calado; Alberto H. F. Laender

A vast amount of valuable information, produced and consumed by people and institutions, is currently stored in relational databases. For many purposes, there is an ever increasing demand for having these databases published on the Web, so that users can query the data available in them. An important requirement for this to happen is that query interfaces must be as simple and intuitive as possible. In this paper we present LABRADOR, a system for efficiently publishing relational databases on the Web by using a simple text box query interface. The system operates by taking an unstructured keyword-based query posed by a user and automatically deriving an equivalent SQL query that fits the users information needs, as expressed by the original query. The SQL query is then sent to a DBMS and its results are processed by LABRADOR to create a relevance-based ranking of the answers. Experiments we present show that LABRADOR can automatically find the most suitable SQL query in more than 75% of the cases, and that the overhead introduced by the system in the overall query processing time is almost insignificant. Furthermore, the system operates in a non-intrusive way, since it requires no modifications to the target database schema.

international world wide web conferences | 2009

Automatically filling form-based web interfaces with free text inputs

Guilherme A. Toda; Eli Cortez; Filipe de Sá Mesquita; Altigran Soares da Silva; Edleno Silva de Moura; Marden S. Neubert

On the web of today the most prevalent solution for users to interact with data-intensive applications is the use of form-based interfaces composed by several data input fields, such as text boxes, radio buttons, pull-down lists, check boxes, etc. Although these interfaces are popular and effective, in many cases, free text interfaces are preferred over form-based ones. In this paper we discuss the proposal and the implementation of a novel IR-based method for using data rich free text to interact with form-based interfaces. Our solution takes a free text as input, extracts implicitly data values from it and fills appropriate fields using them. For this task, we rely on values of previous submissions for each field, which are freely obtained from the usage of form-based interfaces

international conference on management of data | 2012

Clustering techniques for open relation extraction

Filipe de Sá Mesquita

This work investigates clustering techniques for Relation Extraction (RE). Relation Extraction is the task of extracting relationships among named entities (e.g., people, organizations and geo-political entities) from natural language text. We are particularly interested in the open RE scenario, where the number of target relations is too large or even unknown. Our contributions are in two aspects of the clustering process: (1) extraction and weighting of features and (2) scalability. In order to evaluate our techniques in large scale, we propose an automatic evaluation method based on pointwise mutual information. Our preliminary results show that our clustering techniques as well as our evaluation method are promising.

web information and data management | 2007

FleDEx: flexible data exchange

Filipe de Sá Mesquita; Denilson Barbosa; Eli Cortez; Altigran Soares da Silva

We propose a lightweight framework for data exchange that is suitable for non-expert and casual users sharing data on the Web or through peer-to-peer systems. Unlike previous work, we consider a simplistic data model and schema formalism that are suitable for describing typical online data, and propose algorithms for mapping such schemas as well as for translating the corresponding instances. Our solution requires minimal overhead and setup costs compared to existing data exchange systems, making it very attractive in the Web data exchange setting. We report experimental results indicating that our method works well with real Web data from various domains.

international acm sigir conference on research and development in information retrieval | 2010

Incorporating global information into named entity recognition systems using relational context

Yuval Merhav; Filipe de Sá Mesquita; Denilson Barbosa; Wai Gen Yee; Ophir Frieder

The state-of-the-art in Named Entity Recognition relies on a combination of local features of the text and global knowledge to determine the types of the recognized entities. This is problematic in some cases, resulting in entities being classified as belonging to the wrong type. We show that using global information about the corpus improves the accuracy of type identification. We explore the notion of a global domain frequency that relates relation identifying terms with pairs of entity types which are used in that relation. We use this to identify entities whose types are not compatible with the terms they co-occur in the text. Our results on a large corpus of social media content allows the identification of mistyped entities with 70% accuracy.

empirical methods in natural language processing | 2013