João Carlos Pereira da Silva
Federal University of Rio de Janeiro
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João Carlos Pereira da Silva.
international conference natural language processing | 2011
André Freitas; João Gabriel Oliveira; Sean O'Riain; Edward Curry; João Carlos Pereira da Silva
Linked Data brings the promise of incorporating a new dimension to the Web where the availability of Web-scale data can determine a paradigmatic transformation of the Web and its applications. However, together with its opportunities, Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, or on corporate intranets, should be able to search and query data spread over potentially a large number of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipediabased semantic relatedness measure and spreading activation. The combination of these three elements in a query mechanism for Linked Data is a new contribution in the space. Wikipedia-based relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBPedia achieved a mean reciprocal rank of 61.4%, an average precision of 48.7% and average recall of 57.2%, answering 70% of the queries.
data and knowledge engineering | 2013
André Freitas; João Gabriel Oliveira; Sean O'Riain; João Carlos Pereira da Silva; Edward Curry
Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, should be able to search and query data spread over potentially large numbers of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipedia-based semantic relatedness measure and spreading activation. Wikipedia-based semantic relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBpedia achieved a mean reciprocal rank of 61.4%, an average precision of 48.7% and average recall of 57.2%.
applications of natural language to data bases | 2014
André Freitas; João Carlos Pereira da Silva; Edward Curry; Paul Buitelaar
Tasks such as question answering and semantic search are dependent on the ability of querying & reasoning over large-scale commonsense knowledge bases (KBs). However, dealing with commonsense data demands coping with problems such as the increase in schema complexity, semantic inconsistency, incompleteness and scalability. This paper proposes a selective graph navigation mechanism based on a distributional relational semantic model which can be applied to querying & reasoning over heterogeneous knowledge bases (KBs). The approach can be used for approximative reasoning, querying and associational knowledge discovery. In this paper we focus on commonsense reasoning as the main motivational scenario for the approach. The approach focuses on addressing the following problems: (i) providing a semantic selection mechanism for facts which are relevant and meaningful in a specific reasoning & querying context and (ii) allowing coping with information incompleteness in large KBs. The approach is evaluated using ConceptNet as a commonsense KB, and achieved high selectivity, high scalability and high accuracy in the selection of meaningful navigational paths. Distributional semantics is also used as a principled mechanism to cope with information incompleteness.
international conference on control, automation, robotics and vision | 2010
Horacio L. França; João Carlos Pereira da Silva; Omar Lengerke; Max Suell Dutra; Massimo De Gregorio; Felipe M. G. França
The reproduction of the movements of a ship by automated platforms, without the use of sensors providing exact data related to the numeric variables involved, is a non-trivial matter. The creation of an artificial vision system that can follow the cadence of said ship, in six axes of freedom, is the goal of this research. Considering that a real time response is a requisite in this case, it was decided to adopt a Boolean artificial neural network system that could identify and follow arbitrary interest points that could define, as a group, a model of the movement of an observed vessel. This paper describes the development of a prototype based on the Boolean perceptron model WiSARD (Wilkie, Stonham and Aleksanders Recognition Device), that is being implemented in the C programming language on a desktop computer using a regular webcam as input.
database and expert systems applications | 2013
André Freitas; Sean O'Riain; Edward Curry; João Carlos Pereira da Silva; Danilo S. Carvalho
The integration of a small fraction of the information present in the Web of Documents to the Linked Data Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual information capture.
international conference natural language processing | 2011
André Freitas; João Gabriel Oliveira; Sean O'Riain; Edward Curry; João Carlos Pereira da Silva
Linked Data promises an unprecedented availability of data on the Web. However, this vision comes together with the associated challenges of querying highly heterogeneous and distributed data. In order to query Linked Data on the Web today, end-users need to be aware of which datasets potentially contain the data and the data model behind these datasets. This query paradigm, deeply attached to the traditional perspective of structured queries over databases, does not suit the heterogeneity and scale of the Web, where it is impractical for data consumers to have an a priori understanding of the structure and location of available datasets. This work describes Treo, a best-effort natural language query mechanism for Linked Data, which focuses on the problem of bridging the semantic gap between end-user natural language queries and Linked Datasets.
foundations of information and knowledge systems | 2014
João Carlos Pereira da Silva; André Freitas
Distributional semantics focuses on the automatic construction of a semantic model based on the statistical distribution of co-located words in large-scale texts. Deductive reasoning is a fundamental component for semantic understanding. Despite the generality and expressivity of logical models, from an applied perspective, deductive reasoners are dependent on highly consistent conceptual models, which limits the application of reasoners to highly heterogeneous and open domain knowledge sources. Additionally, logical reasoners may present scalability issues. This work focuses on advancing the conceptual and formal work on the interaction between distributional semantics and logic, focusing on the introduction of a distributional deductive inference model for large-scale and heterogeneous knowledge bases. The proposed reasoning model targets the following features: i an approximative ontology-agnostic reasoning approach for logical knowledge bases, ii the inclusion of large volumes of distributional semantics commonsense knowledge into the inference process and iii the provision of a principled geometric representation of the inference process.
applications of natural language to data bases | 2014
André Freitas; Rafael Bezerra Vieira; Edward Curry; Danilo S. Carvalho; João Carlos Pereira da Silva
Natural language descriptors used for categorizations are present from folksonomies to ontologies. While some descriptors are composed of simple expressions, other descriptors have complex compositional patterns (e.g. ‘French Senators Of The Second Empire’, ‘Churches Destroyed In The Great Fire Of London And Not Rebuilt’). As conceptual models get more complex and decentralized, more content is transferred to unstructured natural language descriptors, increasing the terminological variation, reducing the conceptual integration and the structure level of the model. This work describes a representation for complex natural language category descriptors (NLCDs). In the representation, complex categories are decomposed into a graph of primitive concepts, supporting their interlinking and semantic interpretation. A category extractor is built and the quality of its extraction under the proposed representation model is evaluated.
extended semantic web conference | 2013
Danilo S. Carvalho; André Freitas; João Carlos Pereira da Silva
This demo presents Graphia, an information extraction pipe-line targeting an RDF representation of unstructured data in the form of structured discourse graphs (SDGs). It combines natural language processing and information extraction techniques with the use of linked open data resources and semantic web technologies to enable discourse representation as a set of contextualized relationships between entities.
international conference on logic programming | 1995
João Carlos Pereira da Silva; Sheila R. M. Veloso
The purpose of this paper is to show that we can consider an extension of a Reiters default theory (W,Δ) as the expansion of the (belief) set W by some maximal set D of consequences of defaults in Δ. We will use the model of revision functions proposed by Grove [13] to characterize the models of the extensions in Reiters default logic [24], showing that the class of models we obtain in the special case when a revision is an expansion (i.e., a new sentence A is added to a belief set K and no sentence in K is deleted), is the class of models of some extension in Reiters default logic. Furthermore, we will show that the class of models in Pooles system for default reasoning can be characterized in the same way.