Diego Esteves | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Diego Esteves is active.

Explore More

Publication

Featured researches published by Diego Esteves.

international conference on semantic systems | 2015

MEX vocabulary: a lightweight interchange format for machine learning experiments

Diego Esteves; Diego Moussallem; Ciro Baron Neto; Tommaso Soru; Markus Ackermann; Jens Lehmann

Over the last decades many machine learning experiments have been published, giving benefit to the scientific progress. In order to compare machine-learning experiment results with each other and collaborate positively, they need to be performed thoroughly on the same computing environment, using the same sample datasets and algorithm configurations. Besides this, practical experience shows that scientists and engineers tend to have large output data in their experiments, which is both difficult to analyze and archive properly without provenance metadata. However, the Linked Data community still misses a lightweight specification for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. In this paper, we address this gap by presenting a novel vocabulary dubbed MEX. We show that MEX provides a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.

Journal of Web Semantics | 2015

DeFacto-Temporal and multilingual Deep Fact Validation

Daniel Gerber; Diego Esteves; Jens Lehmann; Lorenz Bühmann; Axel-Cyrille Ngonga Ngomo; René Speck

One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation)-an algorithm able to validate facts by finding trustworthy sources for them on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of web pages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. In addition, DeFacto provides support for facts with a temporal scope, i.e.,?it can estimate in which time frame a fact was valid. Given that the automatic evaluation of facts has not been paid much attention to so far, generic benchmarks for evaluating these frameworks were not previously available. We thus also present a generic evaluation framework for fact checking and make it publicly available.

international conference on web engineering | 2017

Named Entity Recognition in Twitter Using Images and Text

Diego Esteves; Rafael Peres; Jens Lehmann; Giulio Napolitano

Named Entity Recognition (NER) is an important subtask of information extraction that seeks to locate and recognise named entities. Despite recent achievements, we still face limitations with correctly detecting and classifying entities, prominently in short and noisy text, such as Twitter. An important negative aspect in most of NER approaches is the high dependency on hand-crafted features and domain-specific knowledge, necessary to achieve state-of-the-art results. Thus, devising models to deal with such linguistically complex contexts is still challenging. In this paper, we propose a novel multi-level architecture that does not rely on any specific linguistic resource or encoded rule. Unlike traditional approaches, we use features extracted from images and text to classify named entities. Experimental tests against state-of-the-art NER for Twitter on the Ritter dataset present competitive results (0.59 F-measure), indicating that this approach may lead towards better NER models.

international conference on semantic systems | 2017

IDOL: Comprehensive & Complete LOD Insights

Ciro Baron Neto; Dimitris Kontokostas; Amit Kirschenbaum; Gustavo Publio; Diego Esteves; Sebastian Hellmann

Over the last decade, we observed a steadily increasing amount of RDF datasets made available on the web of data. The decentralized nature of the web, however, makes it hard to identify all these datasets. Even more so, when downloadable data distributions are discovered, only insufficient metadata is available to describe the datasets properly, thus posing barriers on its usefulness and reuse. In this paper, we describe an attempt to exhaustively identify the whole linked open data cloud by harvesting metadata from multiple sources, providing insights about duplicated data and the general quality of the available metadata. This was only possible by using a probabilistic data structure called Bloom filter. Finally, we published a dump file containing metadata which can further be used to enrich existent datasets.

International Journal of Advanced Research in Artificial Intelligence | 2013

Prediction of assets behavior in financial series using machine learning algorithms

Diego Esteves; Julio Cesar Duarte Cesar

The prediction of financial assets using either classification or regression models, is a challenge that has been growing in the recent years, despite the large number of publications of forecasting models for this task. Basically, the non-linear tendency of the series and the unexpected behavior of assets (compared to forecasts generated in studies of fundamental analysis or technical analysis) make this problem very hard to solve. In this work, we present for this task some modeling techniques using Support Vector Machines (SVM) and a comparative performance analysis against other basic machine learning approaches, such as Logistic Regression and Naive Bayes. We use an evaluation set based on company stocks of the BVM&F, the official stock market in Brazil, the third largest in the world. We show good prediction results, and we conclude that it is not possible to find a single model that generates good results for every asset. We also present how to evaluate such parameters for each model. The generated model can also provide additional information to other approaches, such as regression models.

Journal of Data and Information Quality | 2018

Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis

Diego Esteves; Anisa Rula; Aniketh Janardhan Reddy; Jens Lehmann

Among different characteristics of knowledge bases, data quality is one of the most relevant to maximize the benefits of the provided information. Knowledge base quality assessment poses a number of big data challenges such as high volume, variety, velocity, and veracity. In this article, we focus on answering questions related to the assessment of the veracity of facts through Deep Fact Validation (DeFacto), a triple validation framework designed to assess facts in RDF knowledge bases. Despite current developments in the research area, the underlying framework faces many challenges. This article pinpoints and discusses these issues and conducts a thorough analysis of its pipeline, aiming at reducing the error propagation through its components. Furthermore, we discuss recent developments related to this fact validation as well as describing advantages and drawbacks of state-of-the-art models. As a result of this exploratory analysis, we give insights and directions toward a better architecture to tackle the complex task of fact-checking in knowledge bases.

international conference on knowledge capture | 2017

Bidirectional LSTM with a Context Input Window for Named Entity Recognition in Tweets

Rafael Peres; Diego Esteves; Gaurav Maheshwari

Lately, with the increasing popularity of social media technologies, applying natural language processing for mining information in tweets has posed itself as a challenging task and has attracted significant research efforts. In contrast with the news text and others formal content, tweets pose a number of new challenges, due to their short and noisy nature. Thus, over the past decade, different Named Entity Recognition (NER) architectures have been proposed to solve this problem. However, most of them are based on handcrafted-features and restricted to a particular domain, which imposes a natural barrier to generalize over different contexts. In this sense, despite the long line of work in NER on formal domains, there are no studies in NER for tweets in Portuguese (despite 17.97 million monthly active users). To bridge this gap, we present a new gold-standard corpus of tweets annotated for Person, Location, and Organization (PLO). Additionally, we also perform multiple NER experiments using a variety of Long Short-Term Memory (LSTM) based models without resorting to any handcrafted rules. Our approach with a centered context input window of word embeddings yields 52.78 F1 score, 38.68% higher compared to a state of the art baseline system.

Proceedings of the International Conference on Web Intelligence | 2017

LOG4MEX: a library to export machine learning experiments

Diego Esteves; Diego Moussallem; Tommaso Soru; Ciro Baron Neto; Jens Lehmann; Axel-Cyrille Ngonga Ngomo; Julio Cesar Duarte

A choice of the best computational solution for a particular task is increasingly reliant on experimentation. Even though experiments are often described through text, tables, and figures, their descriptions are often incomplete or confusing. Thus, researchers often have to perform lengthy web searches for reproducing and understanding the results. In order to minimize this gap, vocabularies and ontologies have been proposed for representing data mining and machine learning (ML) experiments. However, we still lack proper tools to export properly these metadata. To this end, we present an open-source library dubbed LOG4MEX which aims at supporting the scientific community to fulfill this gap.

international conference on semantic systems | 2016