Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Niraj Aswani is active.

Publication


Featured researches published by Niraj Aswani.


acm conference on hypertext | 2013

Microblog-genre noise and impact on semantic annotation accuracy

Leon Derczynski; Diana Maynard; Niraj Aswani; Kalina Bontcheva

Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Semantic annotation of tweets is typically performed in a pipeline, comprising successive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). Consequently, errors are cumulative, and earlier-stage problems can severely reduce the performance of final stages. This paper presents a characterisation of genre-specific problems at each semantic annotation stage and the impact on subsequent stages. Critically, we evaluate impact on two high-level semantic annotation tasks: named entity detection and disambiguation. Our results demonstrate the importance of making approaches specific to the genre, and indicate a diminishing returns effect that reduces the effectiveness of complex text normalisation.


language resources and evaluation | 2013

GATE Teamware: a web-based, collaborative text annotation framework

Kalina Bontcheva; Hamish Cunningham; Ian Roberts; Angus Roberts; Valentin Tablan; Niraj Aswani; Genevieve Gorrell

This paper presents GATE Teamware—an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-processed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is available as on-demand service on GateCloud.net, as well as open-source for self-installation.


patent information retrieval | 2008

Large-scale, parallel automatic patent annotation

Milan Agatonovic; Niraj Aswani; Kalina Bontcheva; Hamish Cunningham; Thomas Heitz; Yaoyong Li; Ian Roberts; Valentin Tablan

When researching new product ideas or filing new patents, inventors need to retrieve all relevant pre-existing know-how and/or to exploit and enforce patents in their technological domain. However, this process is hindered by lack of richer metadata, which if present, would allow more powerful concept-based search to complement the current keyword-based approach. This paper presents our approach to automatic patent enrichment, tested in large-scale, parallel experiments on USPTO and EPO documents. It starts by defining the metadata annotation task and examines its challenges. The text analysis tools are presented next, including details on automatic annotation of sections, references and measurements. The key challenges encountered were dealing with ambiguities and errors in the data; creation and maintenance of large, domain-independent dictionaries; and building an efficient, robust patent analysis pipeline, capable of dealing with terabytes of data. The accuracy of automatically created metadata is evaluated against a human-annotated gold standard, with results of over 90% on most annotation types.


meeting of the association for computational linguistics | 2005

A Hybrid Approach to Align Sentences and Words in English-Hindi Parallel Corpora

Niraj Aswani; Robert J. Gaizauskas

In this paper we describe an alignment system that aligns English-Hindi texts at the sentence and word level in parallel corpora. We describe a simple sentence length approach to sentence alignment and a hybrid, multi-feature approach to perform word alignment. We use regression techniques in order to learn parameters which characterise the relationship between the lengths of two sentences in parallel text. We use a multi-feature approach with dictionary lookup as a primary technique and other methods such as local word grouping, transliteration similarity (edit-distance) and a nearest aligned neighbours approach to deal with many-to-many word alignment. Our experiments are based on the EMILLE (Enabling Minority Language Engineering) corpus. We obtained 99.09% accuracy for many-to-many sentence alignment and 77% precision and 67.79% recall for many-to-many word alignment.


patent information retrieval | 2011

Information Extraction and Semantic Annotation for Multi-Paradigm Information Management

Hamish Cunningham; Valentin Tablan; Ian Roberts; Mark A. Greenwood; Niraj Aswani

This chapter describes the development of GATE Mimir, a new tool for indexing documents according to multiple paradigms: full text, conceptual model, and annotation structures. We also present a usage example for patent searchers covering measurements and high-level structural information which was automatically extracted from a large patent corpus.


PLOS ONE | 2012

Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method

Mattias Johansson; Angus Roberts; Dan Chen; Yaoyong Li; Manon Delahaye-Sourdeix; Niraj Aswani; Mark A. Greenwood; Simone Benhamou; Pagona Lagiou; Ivana Holcatova; Lorenzo Richiardi; Kristina Kjaerheim; Antonio Agudo; Xavier Castellsagué; Tatiana V. Macfarlane; Luigi Barzan; Cristina Canova; Nalin Thakker; David I. Conway; Ariana Znaor; Claire M. Healy; Wolfgang Ahrens; David Zaridze; Neonilia Szeszenia-Dabrowska; Jolanta Lissowska; Eleonora Fabianova; Ioan Nicolae Mates; Vladimir Bencko; Lenka Foretova; Vladimir Janout

Background Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [ptrend] = 2.5×10−3). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76–0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).


artificial intelligence methodology systems applications | 2004

Automatic Creation and Monitoring of Semantic Metadata in a Dynamic Knowledge Portal

Diana Maynard; Milena Yankova; Niraj Aswani; Hamish Cunningham

The h-TechSight Knowledge Management Portal enables support for knowledge intensive industries in monitoring information resources on the Web, as an important factor in business competitiveness. Users can be automatically notified when a change occurs in their domain of interest. As part of this knowledge management platform, we have developed an ontology-based information extraction system to identify instances of concepts relevant to the user’s interests and to monitor them over time. The application has initially been implemented in the Employment domain, and is currently being extended to other areas in the Chemical Engineering field. The information extraction system has been evaluated over a test set of 38 documents and achieves 97% Precision and 92% Recall.


recent advances in natural language processing | 2013

TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

Kalina Bontcheva; Leon Derczynski; Adam Funk; Mark A. Greenwood; Diana Maynard; Niraj Aswani


Archive | 2007

Indexing and querying linguistic metadata and document content

Niraj Aswani; Valentin Tablan; Kalina Bontcheva; Hamish Cunningham


Archive | 2010

Developing Language Processing Components with GATE Version 5 (a User Guide)

Hamish Cunningham; Diana Maynard; Kalina Bontcheva; Valentin Tablan; Niraj Aswani; Ian Roberts; Genevieve Gorrell; Adam Funk; Angus Roberts; Danica Damljanovic; Thomas Heitz; R. Mark Greenwood; Horacio Saggion; Johann Petrak; Yaoyong Li; William A. Peters

Collaboration


Dive into the Niraj Aswani's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ian Roberts

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yaoyong Li

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge