Pablo N. Mendes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pablo N. Mendes is active.

Explore More

Publication

Featured researches published by Pablo N. Mendes.

Semantic Web | 2015

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

Jens Lehmann; Robert Isele; Max Jakob; Anja Jentzsch; Dimitris Kontokostas; Pablo N. Mendes; Sebastian Hellmann; Mohamed Morsey; Patrick van Kleef; Sören Auer

The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a world-wide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.

international conference on semantic systems | 2011

DBpedia spotlight: shedding light on the web of documents

Pablo N. Mendes; Max Jakob; Andrés García-Silva

Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

international conference on semantic systems | 2013

Improving efficiency and accuracy in multilingual entity extraction

Joachim Daiber; Max Jakob; Chris Hokamp; Pablo N. Mendes

There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.

edbt icdt workshops | 2012

Sieve: linked data quality assessment and fusion

Pablo N. Mendes; Hannes Mühleisen

The Web of Linked Data grows rapidly and already contains data originating from hundreds of data sources. The quality of data from those sources is very diverse, as values may be out of date, incomplete or incorrect. Moreover, data sources may provide conflicting values for a single real-world object. In order for Linked Data applications to consume data from this global data space in an integrated fashion, a number of challenges have to be overcome. One of these challenges is to rate and to integrate data based on their quality. However, quality is a very subjective matter, and finding a canonic judgement that is suitable for each and every task is not feasible. To simplify the task of consuming high-quality data, we present Sieve, a framework for flexibly expressing quality assessment methods as well as fusion methods. Sieve is integrated into the Linked Data Integration Framework (LDIF), which handles Data Access, Schema Mapping and Identity Resolution, all crucial preliminaries for quality assessment and fusion. We demonstrate Sieve in a data integration scenario importing data from the English and Portuguese versions of DBpedia, and discuss how we increase completeness, conciseness and consistency through the use of our framework.

international semantic web conference | 2012

Managing the life-cycle of linked data with the LOD2 stack

Sören Auer; Lorenz Bühmann; Christian Dirschl; Orri Erling; Michael Hausenblas; Robert Isele; Jens Lehmann; Michael Martin; Pablo N. Mendes; Bert Van Nuffelen; Claus Stadler; Sebastian Tramp; Hugh Williams

The LOD2 Stack is an integrated distribution of aligned tools which support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. The LOD2 Stack comprises new and substantially extended existing tools from the LOD2 project partners and third parties. The stack is designed to be versatile; for all functionality we define clear interfaces, which enable the plugging in of alternative third-party implementations. The architecture of the LOD2 Stack is based on three pillars: ( 1 ) Software integration and deployment using the Debian packaging system. ( 2 ) Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between the different tools of the LOD2 Stack. ( 3 ) Integration of the LOD2 Stack user interfaces based on REST enabled Web Applications. These three pillars comprise the methodological and technological framework for integrating the very heterogeneous LOD2 Stack components into a consistent framework. In this article we describe these pillars in more detail and give an overview of the individual LOD2 Stack components. The article also includes a description of a real-world usage scenario in the publishing domain.

ieee international conference semantic computing | 2008

TcruziKB: Enabling Complex Queries for Genomic Data Exploration

Pablo N. Mendes; Bobby McKnight; Amit P. Sheth; Jessica C. Kissinger

We developed a novel analytical environment to aid in the examination of the extensive amount of interconnected data available for genome projects. Our focus is to enable flexibility and abstraction from implementation details, while retaining the expressivity required for post-genomic research. To achieve this goal, we associated genomics data to ontologies and implemented a query formulation and execution environment with added visualization capabilities. We use ontology schemas to guide the user through the process of building complex queries in a flexible Web interface. Queries are serialized in SPARQL and sent to servers via Ajax. A component for visualization of the results allows researchers to explore result sets in multiple perspectives to suit different analytical needs. We show a use case of semantic computing with real world data. We demonstrate facilitated access to information through expressive queries in a flexible and friendly user interface. Our system scores 90.54% in a user satisfaction evaluation with 30 subjects. In comparison with traditional genome databases, preliminary evaluation indicates a reduction of the amount of user interaction required to answer the provided sample queries.

web intelligence | 2010

Linked Open Social Signals

Pablo N. Mendes; Alexandre Passant; Pavan Kapanipathi; Amit P. Sheth

In this paper we discuss the collection, semantic annotation and analysis of real-time social signals from micro blogging data. We focus on users interested in analyzing social signals collectively for sense making. Our proposal enables flexibility in selecting subsets for analysis, alleviating information overload. We define an architecture that is based on state-of-the-art Semantic Web technologies and a distributed publish-subscribe protocol for real time communication. In addition, we discuss our method and application in a scenario related to the health care reform in the United States.

international conference on semantic systems | 2010

Twarql: tapping into the wisdom of the crowd

Pablo N. Mendes; Alexandre Passant; Pavan Kapanipathi

Twarql is an infrastructure translating microblog posts from Twitter as Linked Open Data in real-time. The approach employed in Twarql can be summarized as follows: (1) extract content (e.g. entity mentions, hashtags and URLs) from microposts streamed from Twitter; (2) encode content in RDF using shared and well-known vocabularies (FOAF, SIOC, MOAT, etc.); (3) enable structured querying of microposts with SPARQL; (4) enable subscription to a stream of microposts that match a given query; and (5) enable scalable real-time delivery of streaming annotated data using sparqlPuSH. In this paper we use a brand tracking scenario to demonstrate how Twarql enables flexibility in handling the information overload of those interested in collectively analyzing microblog data for sensemaking. The dataset produced is shared as Linked Data. Twarql is available as open source and can be easily deployed or extended for monitoring Twitter data in various contexts such as brand tracking, disaster relief management, stock exchange monitoring, etc.

knowledge acquisition, modeling and management | 2008

Unsupervised Discovery of Compound Entities for Relationship Extraction

Cartic Ramakrishnan; Pablo N. Mendes; Shaojun Wang; Amit P. Sheth

In this paper we investigate unsupervised population of a biomedical ontology via information extraction from biomedical literature. Relationships in text seldom connect simple entities. We therefore focus on identifying compound entities rather than mentions of simple entities. We present a method based on rules over grammatical dependency structures for unsupervised segmentation of sentences into compound entities and relationships. We complement the rule-based approach with a statistical component that prunes structures with low information content, thereby reducing false positives in the prediction of compound entities, their constituents and relationships. The extraction is manually evaluated with respect to the UMLS Semantic Network by analyzing the conformance of the extracted triples with the corresponding UMLS relationship type definitions.

international conference on knowledge capture | 2011

Multipedia: enriching DBpedia with multimedia information

Andrés García-Silva; Max Jakob; Pablo N. Mendes

Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.

Explore More