Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mário J. Silva is active.

Publication


Featured researches published by Mário J. Silva.


data and knowledge engineering | 2007

Measuring semantic similarity between Gene Ontology terms

Francisco M. Couto; Mário J. Silva; Pedro M. Coutinho

Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. This paper adds two new contributions. First, a study of the correlation between Gene Ontology (GO) terms and family similarity demonstrates that protein families constitute an appropriate baseline for validating GO similarity. Secondly, we introduce GraSM, a novel method that uses all the information in the graph structure of the Gene Ontology, instead of considering it as a hierarchical tree. GraSM gives a consistently higher family similarity correlation on all aspects of GO than the original semantic similarity measures.


conference on information and knowledge management | 2009

Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-)

Paula Carvalho; Luís Sarmento; Mário J. Silva; Eugénio de Oliveira

We investigate the accuracy of a set of surface patterns in identifying ironic sentences in comments submitted by users to an on-line newspaper. The initial focus is on identifying irony in sentences containing positive predicates since these sentences are more exposed to irony, making their true polarity harder to recognize. We show that it is possible to find ironic sentences with relatively high precision (from 45% to 85%) by exploring certain oral or gestural clues in user comments, such as emoticons, onomatopoeic expressions for laughter, heavy punctuation marks, quotation marks and positive interjections. We also demonstrate that clues based on deeper linguistic information are relatively inefficient in capturing irony in user-generated content, which points to the need for exploring additional types of oral clues.


geographic information retrieval | 2006

Adding Geographic Scopes to Web Resources

Mário J. Silva; Bruno Martins; Marcirio Silveira Chaves; Ana Paula Afonso; Nuno Cardoso

Many web pages are rich in geographic information and primarily relevant to geographically limited communities. However, existing IR systems only recently began to offer local services and largely ignore geo-spatial information. This paper presents our work on automatically identifying the geographical scope of web documents, which provides the means to develop retrieval tools that take the geographical context into consideration. Our approach makes extensive use of an ontology of geographical concepts, and includes a system architecture for extracting geographic information from large collections of web documents. The proposed method involves recognising geographical references over the documents and assigning geographical scopes through a graph ranking algorithm. Initial evaluation results are encouraging, indicating the viability of this approach.


geographic information retrieval | 2005

Indexing and ranking in Geo-IR systems

Bruno Martins; Mário J. Silva; Leonardo Andrade

This paper addresses document indexing and retrieval using geographical location. It discusses possible indexing structures and result ranking algorithms, surveying known approaches and showing how they can be combined to build an effective Geo-IR system.


conference on information and knowledge management | 2005

Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors

Francisco M. Couto; Mário J. Silva; Pedro M. Coutinho

Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. In most biological databases, proteins are already annotated with ontology terms. Previous studies identified a correlation between the sequence similarity and the semantic similarity of proteins. The semantic similarity of proteins was computed from their annotated GO terms. However, proteins sharing a biological role do not necessarily have a similar sequence.This paper introduces our study of the correlation between GO and family similarity. Family similarity overcomes some of the limitations of sequence similarity, thus we obtained a strong correlation between GO and family similarity. Additionally, this paper introduces GraSM, a novel method that uses all the information in the graph structure of the GO, instead of considering it as a hierarchical tree. When calculating the semantic similarity of two concepts, GraSM selects the disjunctive common ancestors rather than only using the most informative common ancestor. GraSM produced a higher family similarity correlation than the original semantic similarity measures.


BMC Bioinformatics | 2005

Finding genomic ontology terms in text using evidence content

Francisco M. Couto; Mário J. Silva; Pedro M. Coutinho

BackgroundThe development of text mining systems that annotate biological entities with their properties using scientific literature is an important recent research topic. These systems need first to recognize the biological entities and properties in the text, and then decide which pairs represent valid annotations.MethodsThis document introduces a novel unsupervised method for recognizing biological properties in unstructured text, involving the evidence content of their names.ResultsThis document shows the results obtained by the application of our method to BioCreative tasks 2.1 and 2.2, where it identified Gene Ontology annotations and their evidence in a set of articles.ConclusionFrom the performance obtained in BioCreative, we concluded that an automatic annotation system can effectively use our method to identify biological properties in unstructured text.


Journal of Biomedical Discovery and Collaboration | 2006

GOAnnotator: linking protein GO annotations to evidence text

Francisco M. Couto; Mário J. Silva; Vivian Lee; Emily Dimmer; Evelyn Camon; Rolf Apweiler; Harald Kirsch; Dietrich Rebholz-Schuhmann

BackgroundAnnotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators.ResultsIn this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins.ConclusionThe GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one to achieve high precision, which is crucial for the efficient support of GO curators. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.


ACM Transactions on Internet Technology | 2005

Characterizing a national community web

Daniel Gomes; Mário J. Silva

This article presents a characterization of the community Web of the people of Portugal. We defined criteria for delimiting this Web based on our past experience of crawling pages related to Portugal and collected over 3.2 million documents from 46,000 sites satisfying those criteria. Our characterization was derived from this crawl. We describe the rules that we established for defining the boundaries of this community Web and the methodology used to gather statistics. Statistics cover the number and domain distribution of sites; the number, type and size distribution of text documents; and the linkage structure of this Web. We also show how crawling constraints and abnormal situations on the Web can influence the statistics.


Journal of Biomedical Semantics | 2011

Disjunctive shared information between ontology concepts: application to Gene Ontology

Francisco M. Couto; Mário J. Silva

BackgroundThe large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctive ancestors in the ontology are frequently neglected, or not properly explored, by semantic similarity measures.ResultsThis paper proposes a novel method, dubbed DiShIn, that effectively exploits the multiple inheritance relationships present in many biomedical ontologies. DiShIn calculates the shared information content of two ontology concepts, based on the information content of the disjunctive common ancestors of the concepts being compared. DiShIn identifies these disjunctive ancestors through the number of distinct paths from the concepts to their common ancestors.ConclusionsDiShIn was applied to Gene Ontology and its performance was evaluated against state-of-the-art measures using CESSM, a publicly available evaluation platform of protein similarity measures. By modifying the way traditional semantic similarity measures calculate the shared information content, DiShIn was able to obtain a statistically significant higher correlation between semantic and sequence similarity. Moreover, the incorporation of DiShIn in existing applications that exploit multiple inheritance would reduce their execution time.


conference on information and knowledge management | 2009

Automatic creation of a reference corpus for political opinion mining in user-generated content

Luís Sarmento; Paula Carvalho; Mário J. Silva; Eugénio de Oliveira

We propose and evaluate a method for automatically creating a reference corpus for training text classification procedures for mining political opinions in user-generated content. The process starts by compiling a collection of highly opinionated comments posted by users on an on-line newspaper. Then, we define and use a set of manually-crafted high-precision rules supported by a large sentiment-lexicon in order to identify sentences in each comment expressing opinions about political entities. Finally, the opinions found are propagated to the remainder sentences of the comment mentioning the same entities, thus increasing the number and variety of opinion-bearing sentences. Results show that most of the rules can identify negative opinions with very high precision, and these can be safely propagated to the remainder sentences in the comment in almost 100% of the cases. Due to problems arising from irony, the precision of identification drops for positive opinions, but several rules still reach high precision. Propagation of positive opinions is correct in about 77% of the cases, and most errors at this stage result from irony and polarity inversion throughout the comment.

Collaboration


Dive into the Mário J. Silva's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bruno Martins

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge