Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sophia Ananiadou is active.

Publication


Featured researches published by Sophia Ananiadou.


International Journal on Digital Libraries | 2000

Automatic recognition of multi-word terms:. the C-value/NC-value method

K Frantzi; Sophia Ananiadou; Hideki Mima

Abstract.Technical terms (henceforth called terms ), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value ), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms.


panhellenic conference on informatics | 2005

Developing a robust part-of-speech tagger for biomedical text

Yoshimasa Tsuruoka; Yuka Tateishi; Jin-Dong Kim; Tomoko Ohta; John McNaught; Sophia Ananiadou; Jun’ichi Tsujii

This paper presents a part-of-speech tagger which is specifically tuned for biomedical text. We have built the tagger with maximum entropy modeling and a state-of-the-art tagging algorithm. The tagger was trained on a corpus containing newspaper articles and biomedical documents so that it would work well on various types of biomedical text. Experimental results on the Wall Street Journal corpus, the GENIA corpus, and the PennBioIE corpus revealed that adding training data from a different domain does not hurt the performance of a tagger, and our tagger exhibits very good precision (97% to 98%) on all these corpora. We also evaluated the robustness of the tagger using recent MEDLINE articles.


Trends in Biotechnology | 2010

Event Extraction for Systems Biology by Text Mining the Literature

Sophia Ananiadou; Sampo Pyysalo; Jun’ichi Tsujii; Douglas B. Kell

Systems biology recognizes in particular the importance of interactions between biological components and the consequences of these interactions. Such interactions and their downstream effects are known as events. To computationally mine the literature for such events, text mining methods that can detect, extract and annotate them are required. This review summarizes the methods that are currently available, with a specific focus on protein-protein interactions and pathway or network reconstruction. The approaches described will be of considerable value in associating particular pathways and their components with higher-order physiological properties, including disease states.


european conference on research and advanced technology for digital libraries | 1998

The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms

K Frantzi; Sophia Ananiadou; Jun’ichi Tsujii

Technical terms (henceforth called simply terms), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value), combines linguistic and statistical information. The first part, C-value enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms), 2) the incorporation of information from term context words to the extraction of terms.


international joint conference on natural language processing | 2009

Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

Yoshimasa Tsuruoka; Jun’ichi Tsujii; Sophia Ananiadou

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efficiently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for L1-regularized loglinear models.


Bioinformatics | 2008

FACTA: a text search engine for finding associated biomedical concepts

Yoshimasa Tsuruoka; Jun’ichi Tsujii; Sophia Ananiadou

Summary: FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank. Availability: The system is available at http://www.nactem.ac.uk/software/facta/ Contact: [email protected]


international conference on computational linguistics | 1994

A methodology for automatic term recognition

Sophia Ananiadou

The topic of automatic term recognition (ATR) is of great interest especially with the growth of NLP systems, which are passing from the development stage to the application stage. The application of NLP technology involves customlsing systems towards specific needs, particularly in specialised domains (sublanguages) which form the main target of the technology~ There is thus an urgent need for high quality, large scale collections of terminologies (with associated linguistic information) for use in NLP system dictionaries.


international conference on computational linguistics | 1996

Extracting nested collocations

K Frantzi; Sophia Ananiadou

This paper provides an approach to the semi-automatic extraction of collocations from corpora using statistics. The growing availability of large textual corpora, and the increasing number of applications of collocation extraction, has given rise to various approaches on the topic. In this paper, we address the problem of nested collocations; that is, those being part of longer collocations. Most approaches till now, treated substrings of collocations as collocations, only if they appeared frequently enough by themselves in the corpus. These techniques left a lot of collocations unextracted. In this paper, we propose an algorithm for a semi-automatic extraction of nested uninterrupted and interrupted collocations, paying particular attention to nested collocation.


Bioinformatics | 2006

Building an abbreviation dictionary using a term recognition approach

Naoaki Okazaki; Sophia Ananiadou

MOTIVATION Acronyms result from a highly productive type of term variation and trigger the need for an acronym dictionary to establish associations between acronyms and their expanded forms. RESULTS We propose a novel method for recognizing acronym definitions in a text collection. Assuming a word sequence co-occurring frequently with a parenthetical expression to be a potential expanded form, our method identifies acronym definitions in a similar manner to the statistical term recognition task. Applied to the whole MEDLINE (7 811 582 abstracts), the implemented system extracted 886 755 acronym candidates and recognized 300 954 expanded forms in reasonable time. Our method outperformed base-line systems, achieving 99% precision and 82-95% recall on our evaluation corpus that roughly emulates the whole MEDLINE. AVAILABILITY AND SUPPLEMENTARY INFORMATION The implementations and supplementary information are available at our web site: http://www.chokkan.org/research/acromine/


Bioinformatics | 2009

U-Compare

Yoshinobu Kano; William A Baumgartner; Luke McCrohon; Sophia Ananiadou; K. Bretonnel Cohen; Lawrence E. Hunter; Jun’ichi Tsujii

Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the worlds largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. Availability: http://u-compare.org/ Contact: [email protected]

Collaboration


Dive into the Sophia Ananiadou's collaboration.

Top Co-Authors

Avatar

John McNaught

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Paul Thompson

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rafal Rak

University of Manchester

View shared research outputs
Top Co-Authors

Avatar

Sampo Pyysalo

Information Technology University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Raheel Nawaz

University of Manchester

View shared research outputs
Researchain Logo
Decentralizing Knowledge