Valentin Tablan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Valentin Tablan is active.

Explore More

Publication

Featured researches published by Valentin Tablan.

meeting of the association for computational linguistics | 2002

GATE: an Architecture for Development of Robust HLT applications

Hamish Cunningham; Diana Maynard; Kalina Bontcheva; Valentin Tablan

In this paper we present GATE, a framework and graphical development environment which enables users to develop and deploy language engineering components and resources in a robust fashion. The GATE architecture has enabled us not only to develop a number of successful applications for various language processing tasks (such as Information Extraction), but also to build and annotate corpora and carry out evaluations on the applications generated. The framework can be used to develop applications and resources in multiple languages, based on its thorough Unicode support.

Natural Language Engineering | 2004

Evolving GATE to meet new challenges in language engineering

Kalina Bontcheva; Valentin Tablan; Diana Maynard; Hamish Cunningham

In this paper we present recent work on GATE, a widely-used framework and graphical development environment for creating and deploying Language Engineering components and resources in a robust fashion. The GATE architecture has facilitated the development of a number of successful applications for various language processing tasks (such as Information Extraction, dialogue and summarisation), the building and annotation of corpora and the quantitative evaluations of LE applications. The focus of this paper is on recent developments in response to new challenges in Language Engineering: Semantic Web, integration with Information Retrieval and data mining, and the need for machine learning support.

european semantic web conference | 2008

A natural language query interface to structured information

Valentin Tablan; Danica Damljanovic; Kalina Bontcheva

Accessing structured data such as that encoded in ontologies and knowledge bases can be done using either syntactically complex formal query languages like SPARQL or complicated form interfaces that require expensive customisation to each particular application domain. This paper presents the QuestIO system - a natural language interface for accessing structured information, that is domain independent and easy to use without training. It aims to bring the simplicity of Googles search interface to conceptual retrieval by automatically converting short conceptual queries into formal ones, which can then be executed against any semantic repository. QuestIO was developed specifically to be robustwith regard to language ambiguities, incomplete or syntactically ill-formed queries, by harnessing the structure of ontologies, fuzzy stringmatching, and ontology-motivated similarity metrics.

Natural Language Engineering | 2002

Architectural elements of language engineering robustness

Diana Maynard; Valentin Tablan; Hamish Cunningham; Cristian Ursu; Horacio Saggion; Kalina Bontcheva; Yorick Wilks

We discuss robustness in LE systems from the perspective of engineering, and the predictability of both outputs and construction process that this entails. We present an architectural system that contributes to engineering robustness and low-overhead systems development (GATE, a General Architecture for Text Engineering). To verify our ideas we present results from the development of a multi-purpose cross-genre Named Entity recognition system. This system aims be robust across diverse input types, and to reduce the need for costly and timeconsuming adaptation of systems to new applications, with its capability to process texts from widely differing domains and genres.

international semantic web conference | 2007

CLOnE: controlled language for ontology editing

Adam Funk; Valentin Tablan; Kalina Bontcheva; Hamish Cunningham; Brian Davis; Siegfried Handschuh

This paper presents a controlled language for ontology editing and a software implementation, based partly on standard NLP tools, for processing that language and manipulating an ontology. The input sentences are analysed deterministically and compositionally with respect to a given ontology, which the software consults in order to interpret the inputs semantics; this allows the user to learn fewer syntactic structures since some of them can be used to refer to either classes or instances, for example. A repeated-measures, task-based evaluation has been carried out in comparison with a well-known ontology editor; our software received favourable results for basic tasks. The paper also discusses work in progress and future plans for developing this language and tool.

international world wide web conferences | 2005

Web-assisted annotation, semantic indexing and search of television and radio news

Mike Dowman; Valentin Tablan; Hamish Cunningham; Borislav Popov

The Rich News system, that can automatically annotate radio and television news with the aid of resources retrieved from the World Wide Web, is described. Automatic speech recognition gives a temporally precise but conceptually inaccurate annotation model. Information extraction from related web news sites gives the opposite: conceptual accuracy but no temporal data. Our approach combines the two for temporally accurate conceptual semantic annotation of broadcast news. First low quality transcripts of the broadcasts are produced using speech recognition, and these are then automatically divided into sections corresponding to individual news stories. A key phrases extraction component finds key phrases for each story and uses these to search for web pages reporting the same event. The text and meta-data of the web pages is then used to create index documents for the stories in the original broadcasts, which are semantically annotated using the KIM knowledge management platform. A web interface then allows conceptual search and browsing of news stories, and playing of the parts of the media files corresponding to each news story. The use of material from the World Wide Web allows much higher quality textual descriptions and semantic annotations to be produced than would have been possible using the ASR transcript directly. The semantic annotations can form a part of the Semantic Web, and an evaluation shows that the system operates with high precision, and with a moderate level of recall.

International Journal on Digital Libraries | 2004

Text mining in a digital library

Ian H. Witten; Katherine J. Don; Michael Dewsnip; Valentin Tablan

Digital librarians strive to add value to the collections they create and maintain. One way is through selectivity: a carefully chosen set of authoritative documents in a particular topic area is far more useful to those working in the area than a huge, unfocused collection (like the Web). Another is by augmenting the collection with highquality metadata, which supports activities of searching and browsing in a uniform and useful way. A third way, and our topic here, is to enrich the documents by examining their content, extracting information, and using it to enhance the ways they can be located and presented. Text mining is a burgeoning new field that attempts to glean meaningful information from natural-language text. It may be loosely characterized as the process of analyzing text to extract information that is useful for particular purposes. It most commonly targets text whose function is the communication of factual information or opinions, and the motivation for trying to extract information from such text automatically is compelling – even if success is only partial. “Text mining” (sometimes called “text data mining”; [4]) defies tight definition but encompasses a wide range of activities: text summarization; document retrieval; document clustering; text categorization; language identification; authorship ascription; identifying phrases, phrase structures, and key phrases; extracting “entities” such as names, dates, and abbreviations; locating acronyms and their definitions; filling predefined templates with extracted information; and even learning rules from such templates [8]. Techniques of text mining have much to offer digital libraries and their users. Here we describe the marriage of a widely used digital library system (Greenstone) with a development environment for text mining (GATE) to enrich the library reader’s experience. The work is in progress: one level of integration has been demonstrated and another is planned. The project has been greatly facilitated by the fact that both systems are publicly available under the GNU public license – and, in addition, this means that the benefits gained by leveraging text mining techniques will accrue to all Greenstone users.

Philosophical Transactions of the Royal Society A | 2012

GATECloud.net: a Platform for Large-Scale, Open-Source Text Processing on the Cloud

Valentin Tablan; Ian Roberts; Hamish Cunningham; Kalina Bontcheva

Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’, because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research—GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost–benefit analysis and usage evaluation.

international semantic web conference | 2008

RoundTrip Ontology Authoring

Brian Davis; Ahmad Ali Iqbal; Adam Funk; Valentin Tablan; Kalina Bontcheva; Hamish Cunningham; Siegfried Handschuh

Controlled Language (CL) for Ontology Editing tools offer an attractive alternative for naive users wishing to create ontologies, but they are still required to spend time learning the correct syntactic structures and vocabulary in order to use the Controlled Language properly. This paper extends previous work (CLOnE) which uses standard NLP tools to process the language and manipulate an ontology. Here we also generate text in the CL from an existing ontology using template-based (or shallow) Natural Language Generation (NLG). The text generator and the CLOnE authoring process combine to form a RoundTrip Ontology Authoring environment: one can start with an existing imported ontology or one originally produced using CLOnE, (re)produce the Controlled Language, modify or edit the text as required and then turn the text back into the ontology in the CLOnE environment. Building on previous methodology we undertook an evaluation, comparing the RoundTrip Ontology Authoring process with a well-known ontology editor; where previous work required a CL reference manual with several examples in order to use the controlled language, the use of NLG reduces this learning curve for users and improves on existing results for basic ontology editing tasks.

database and expert systems applications | 2002

Developing reusable and robust language processing components for information systems using GATE

Kalina Bontcheva; Hamish Cunningham; Diana Maynard; Valentin Tablan; Horacio Saggion

In this paper we present GATE, an architecture and a graphical development environment which enables users to develop and. deploy HLT applications in a robust fashion. GATE also provides reusable, extendable, and customisable language processing modules (e.g., part of speech tagger, named entity recognition grammars), which combined with the extensive document format support (e.g., XML, HTML), form a useful toolset for building HLT-augmented information systems.

Explore More