Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where José Iria is active.

Publication


Featured researches published by José Iria.


Journal of Web Semantics | 2006

Semantic annotation for knowledge management: Requirements and a survey of the state of the art

Victoria S. Uren; Philipp Cimiano; José Iria; Siegfried Handschuh; Maria Vargas-Vera; Enrico Motta; Fabio Ciravegna

While much of a companys knowledge can be found in text repositories, current content management systems have limited capabilities for structuring and interpreting documents. In the emerging Semantic Web, search, interpretation and aggregation can be addressed by ontology-based semantic mark-up. In this paper, we examine semantic annotation, identify a number of requirements, and review the current generation of semantic annotation systems. This analysis shows that, while there is still some way to go before semantic annotation tools will be able to address fully all the knowledge management needs, research in the area is active and making good progress.


Journal of Intelligent Manufacturing | 2009

Applying semantic web technologies to knowledge sharing in aerospace engineering

Aba-Sah Dadzie; Ravish Bhagdev; Ajay Chakravarthy; Sam Chapman; José Iria; Vitaveska Lanfranchi; João Magalhães; Daniela Petrelli; Fabio Ciravegna

This paper details an integrated methodology to optimise knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses ontologies as a central modelling strategy for the capture of knowledge from legacy documents via automated means, or directly in systems interfacing with knowledge workers, via user-defined, web-based forms. The domain ontologies used for knowledge capture also guide the retrieval of the knowledge extracted from the data using a semantic search system that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale.


text speech and dialogue | 2009

Improving Patient Opinion Mining through Multi-step Classification

Lei Xia; Anna Lisa Gentile; James Munro; José Iria

Automatically tracking attitudes, feelings and reactions in on-line forums, blogs and news is a desirable instrument to support statistical analyses by companies, the government, and even individuals. In this paper, we present a novel approach to polarity classification of short text snippets, which takes into account the way data are naturally distributed into several topics in order to obtain better classification models for polarity. Our approach is multi-step, where in the initial step a standard topic classifier is learned from the data and the topic labels, and in the ensuing step several polarity classifiers, one per topic, are learned from the data and the polarity labels. We empirically show that our approach improves classification accuracy over a real-world dataset by over 10%, when compared against a standard single-step approach using the same feature sets. The approach is applicable whenever training material is available for building both topic and polarity learning models.


Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources | 2009

A Novel Approach to Automatic Gazetteer Generation using Wikipedia

Ziqi Zhang; José Iria

Gazetteers or entity dictionaries are important knowledge resources for solving a wide range of NLP problems, such as entity extraction. We introduce a novel method to automatically generate gazetteers from seed lists using an external knowledge resource, the Wikipedia. Unlike previous methods, our method exploits the rich content and various structural elements of Wikipedia, and does not rely on language- or domain-specific knowledge. Furthermore, applying the extended gazetteers to an entity extraction task in a scientific domain, we empirically observed a significant improvement in system accuracy when compared with those using seed gazetteers.


annual srii global conference | 2011

Automatic Classification of Change Requests for Improved IT Service Quality

Cristina Kadar; Dorothea Wiesmann; José Iria; Dirk Husemann; Mario Lucic

Faulty changes to the IT infrastructure can lead to critical system and application outages, and therefore cause serious economical losses. In this paper, we describe a change planning support tool that aims at assisting the change requesters in leveraging aggregated information associated with the change, like past failure reasons or best implementation practices. The thus gained knowledge can be used in the subsequent planning and implementation steps of the change. Optimal matching of change requests with the aggregated information is achieved through the classification of the change request into about 200 fine-grained activities. We propose to automatically classify the incoming change requests using various information retrieval and machine learning techniques. The cost of building the classifiers is reduced by employing active learning techniques or by leveraging labeled features. Historical tickets from two customers were used to empirically assess and compare the accuracy of the different classification approaches (Lucene index, multinomial logistic regression, and generalized expectation criteria).


italian research conference on digital library management systems | 2010

Semantic Relatedness Approach for Named Entity Disambiguation

Anna Lisa Gentile; Ziqi Zhang; Lei Xia; José Iria

Natural Language is a mean to express and discuss about concepts, objects, events, i.e., it carries semantic contents. One of the ultimate aims of Natural Language Processing techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and their referents, that is, real world objects. This work addresses the problem of giving a sense to proper names in a text, that is, automatically associating words representing Named Entities with their referents. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of the text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on two different datasets.


european semantic web conference | 2009

A Core Ontology of Knowledge Acquisition

José Iria

Semantic descriptions of knowledge acquisition (KA) tools and resources enable machine reasoning about KA systems and can be used to automate the discovery and composition of KA services, thereby increasing interoperability among systems and reducing system design and maintenance costs. Whilst there are a few general-purpose ontologies available that could be combined for describing knowledge acquisition, albeit at an inadequate abstraction level, there is as yet no KA ontology based on Semantic Web technologies available. In this paper, we present OAK, a well-founded, modular, extensible and multimedia-aware ontology of knowledge acquisition which extends existing foundational and core Semantic Web ontologies. We start by using a KA tool development scenario to illustrate the complexity of the problem, and identify a number of requirements for OAK. After we present the ontology in detail, we evaluate it with respect to the identified requirements.


conference on image and video retrieval | 2009

Web news categorization using a cross-media document graph

José Iria; Fabio Ciravegna; João Magalhães

In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and a picture containing additional information regarding the real extent of the event (e.g., how damaged the car is) or providing evidence corroborating the text part. The framework handles multimedia information by considering not only the documents text and images data but also the layout structure which determines how a given text block is related to a particular image. The novelties and contributions of the proposed framework are: (1) support of heterogeneous types of multimedia documents; (2) a document-graph representation method; and (3) the computation of cross-media correlations. Moreover, we applied the framework to the tasks of categorising Web news feed data, and our results show a significant improvement over a single-medium based framework.


international conference on machine learning and applications | 2011

L1 vs. L2 Regularization in Text Classification when Learning from Labeled Features

Sinziana Mazilu; José Iria

In this paper we study the problem of building document classifiers using labeled features and unlabeled documents, where not all the features are helpful for the process of learning. This is an important setting, since building classifiers using labeled words has been recently shown to require considerably less human labeling effort than building classifiers using labeled documents. We propose the use of Generalized Expectation (GE) criteria combined with a L1 regularization term for learning from labeled features. This lets the feature labels guide model expectation constraints, while approaching feature selection from a regularization perspective. We show that GE criteria combined with L1 regularization consistently outperforms -- up to 12% increase in accuracy -- the best previously reported results in the literature under the same setting, obtained using L2 regularization. Furthermore, the results obtained with GE criteria and L1 regularizer are competitive to those obtained in the traditional instance-labeling setting, with the same labeling cost.


european conference on information retrieval | 2011

Domain adaptation for text categorization by feature labeling

Cristina Kadar; José Iria

We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.

Collaboration


Dive into the José Iria's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lei Xia

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Ziqi Zhang

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yorick Wilks

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sam Chapman

University of Sheffield

View shared research outputs
Researchain Logo
Decentralizing Knowledge