Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lucian Vlad Lita is active.

Publication


Featured researches published by Lucian Vlad Lita.


meeting of the association for computational linguistics | 2003

tRuEcasIng

Lucian Vlad Lita; Abe Ittycheriah; Salim Roukos; Nanda Kambhatla

Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ∼98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.


meeting of the association for computational linguistics | 2004

Resource analysis for question answering

Lucian Vlad Lita; Warren A. Hunt; Eric Nyberg

This paper attempts to analyze and bound the utility of various structured and unstructured resources in Question Answering, independent of a specific system or component. We quantify the degree to which gazetteers, web resources, encyclopedia, web documents and web-based query expansion can help Question Answering in general and specific question types in particular. Depending on which resources are used, the QA task may shift from complex answer-finding mechanisms to simpler data extraction methods followed by answer re-mapping in local documents.


empirical methods in natural language processing | 2005

BLANC: Learning Evaluation Metrics for MT

Lucian Vlad Lita; Monica Rogati; Alon Lavie

We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexible, parametrized models can be learned from past data and automatically optimized to correlate well with human judgments for different criteria (e.g. adequacy, fluency) using different correlation measures. Towards this end, we discuss ACS (all common skip-ngrams), a practical algorithm with trainable parameters that estimates reference-candidate translation overlap by computing a weighted sum of all common skip-ngrams in polynomial time. We show that the BLEU and ROUGE metric families are special cases of BLANC, and we compare correlations with human judgments across these three metric families. We analyze the algorithmic complexity of ACS and argue that it is more powerful in modeling both local meaning and sentence-level structure, while offering the same practicality as the established algorithms it generalizes.


north american chapter of the association for computational linguistics | 2003

Identifying and tracking entity mentions in a maximum entropy framework

Abraham Ittycheriah; Lucian Vlad Lita; Nanda Kambhatla; Nicolas Nicolov; Salim Roukos; Margo Stys

We present a system for identifying and tracking named, nominal, and pronominal mentions of entities within a text document. Our maximum entropy model for mention detection combines two pre-existing named entity taggers (built to extract different entity categories) and other syntactic and morphological feature streams to achieve competitive performance. We developed a novel maximum entropy model for tracking all mentions of an entity within a document. We participated in the Automatic Content Extraction (ACE) evaluation and performed well. We describe our system and present results of the ACE evaluation.


conference on information and knowledge management | 2004

Unsupervised question answering data acquisition from local corpora

Lucian Vlad Lita; Jaime G. Carbonell

Data-driven approaches in question answering (QA) are increasingly common. Since availability of training data for such approaches is very limited, we propose an unsupervised algorithm that generates high quality question-answer pairs from local corpora. The algorithm is ontology independent, requiring very small seed data as its starting point. Two alternating views of the data make learning possible: 1) question types are viewed as relations between entities and 2) question types are described by their corresponding question-answer pairs. These two aspects of the data allow us to construct an unsupervised algorithm that acquires high precision question-answer pairs. We show the quality of the acquired data for different question types and perform a task-based evaluation. With each iteration, pairs acquired by the unsupervised algorithm are used as training data to a simple QA system. Performance increases with the number of question-answer pairs acquired confirming the robustness of the unsupervised algorithm. We introduce the notion of <i>semantic drift</i> and show that it is a desirable quality in training data for question answering systems.


conference on information and knowledge management | 2008

Real-time data pre-processing technique for efficient feature extraction in large scale datasets

Ying Liu; Lucian Vlad Lita; R. Stefan Niculescu; Kun Bai; Prasenjit Mitra; C. Lee Giles

Due to the continuous and rampant increase in the size of domain specific data sources, there is a real and sustained need for fast processing in time-sensitive applications, such as medical record information extraction at the point of care, genetic feature extraction for personalized treatment, as well as off-line knowledge discovery such as creating evidence based medicine. Since parallel multi-string matching is at the core of most data mining tasks in these applications, faster on-line matching in static and streaming data is needed to improve the overall efficiency of such knowledge discovery. To solve this data mining need not efficiently handled by traditional information extraction and retrieval techniques, we propose a Block Suffix Shifting-based approach, which is an improvement over the state of the art multi-string matching algorithms such as Aho-Corasick, Commentz-Walter, and Wu-Manber. The strength of our approach is its ability to exploit the different block structures of domain specific data for off-line and online parallel matching. Experiments on several real world datasets show how our approach translates into significant performance improvements.


international conference on machine learning and applications | 2007

Automatic medical coding of patient records via weighted ridge regression

Jian-Wu Xu; Shipeng Yu; Jinbo Bi; Lucian Vlad Lita; Radu Stefan Niculescu; R. Bharat Rao

In this paper, we apply weighted ridge regression to tackle the highly unbalanced data issue in automatic large-scale ICD-9 coding of medical patient records. Since most of the ICD-9 codes are unevenly represented in the medical records, a weighted scheme is employed to balance positive and negative examples. The weights turn out to be associated with the instance priors from a probabilistic interpretation, and an efficient EM algorithm is developed to automatically update both the weights and the regularization parameter. Experiments on a large-scale real patient database suggest that the weighted ridge regression outperforms the conventional ridge regression and linear support vector machines (SVM).


adaptive agents and multi-agents systems | 2001

A system for multi-agent coordination in uncertain environments

Lucian Vlad Lita; Jamieson Schulte; Sebastian Thrun

This paper present a multi-agent architecture for coordinating large numbers of mobile agents (e.g. robots) cooperating in uncertain environments. In particular, the Canadian Traveler Problem (CTP) is the problem of nding a shortest path to a goal location in a graph, where individual edges of the graph might or might not be traversable[1]. The agent has an initial probabilistic knowledge about the states of the edges. Whether or not an edge is traversable can only be found out by moving there. Hence, an optimal solution to a CTP is a contingency plan, which o ers alternative routes if edges are not available. Finding an optimal contingency plan is known to be NP-hard. We focus on the multi-agent CTP, which involves multiple agents attempting to reach multiple target locations. Finding an optimal solution is even harder, since the space of actions at each point in time is exponential in the number of agents. Our multi-agent architecture approaches the above mentioned set of intractable problems in an eÆcient, real-time manner. The architecture supports a large number of mobile, goal-driven information agents that strive to maximize their reward for reaching goals. These agents are coordinated at a higher level by dispatcher agents whose purpose is to maximize the total reward accumulated over time. Extensive experimental results have been obtained in the context of natural disaster relief. Our experiments have been carried out in a realistic simulation of Honduras after Hurricane Mitch destroyed most of the countrys infrastructure.


Parallel Processing Letters | 2000

Algorithmic Complexity with Page-Based Intelligent Memory

Mark Oskin; Lucian Vlad Lita; Frederic T. Chong; Justin Hensley; Diana Keen

High DRAM densities will make intelligent memory chips a commodity in the next five years [1] [2]. This paper focuses upon a promising model of computation in intelligent memory, Active Pa#es[3], where computation is associated with each page of memory. Computational hardware scales linearly and inexpensively with data size in this model, reducing the order of many algorithms. This scaling can, for example, reduce linear-time algorithms to 0(y/ii). When page-based intelligent memory chips become available in commodity, they will change the way programmers select and utilize algorithms. In this paper, we analyze the asymptotic performance of several common algorithms as problem sizes scale. We also derive the optimal page size, as a function of problem size, for each algorithm running with intelligent memory. Finally, we validate these analyses with simulation results.


international conference on move to meaningful internet systems | 2007

Federated ontology search for the medical domain

Vasco Pedro; Lucian Vlad Lita; Stefan Niculescu; Bharat Rao; Jaime G. Carbonell

In this paper we describe a novel methodology for retrieving and combining information from multiple ontologies for the medical domain. In the last decades the number and diversity of available ontologies for the medical domain has grown considerably. The variety and number of such resources available makes the cost to integrate them into an application incremental, often prohibitive for exploratory prototyping, and discouraging for larger-scale integration. Cross-ontology localized merging is proposed as a way to allow for a flexible and scalable solution. This approach also indicates a low maintenance cost and high reusability for different application types within the medical domain

Collaboration


Dive into the Lucian Vlad Lita's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric Nyberg

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Laurie Hiyakumoto

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Vasco Pedro

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

C. Lee Giles

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Diana Keen

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge