Irina Matveeva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Irina Matveeva is active.

Explore More

Publication

Featured researches published by Irina Matveeva.

conference on learning theory | 2004

Regularization and semi-supervised learning on large graphs

Mikhail Belkin; Irina Matveeva; Partha Niyogi

meeting of the association for computational linguistics | 2004

A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora

Eric Gaussier; Jean-Michel Renders; Irina Matveeva; Cyril Goutte; Hervé Déjean

We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons.

international conference on acoustics, speech, and signal processing | 2004

Tikhonov regularization and semi-supervised learning on large graphs

Mikhail Belkin; Irina Matveeva; Partha Niyogi

We consider the problem of labeling a partially labeled graph. This setting may arise in a number of situations from survey sampling to information retrieval to pattern recognition in manifold settings. It is also, especially, of potential practical importance when data is abundant, but labeling is expensive or requires human assistance. Our approach develops a framework for regularization on such graphs parallel to Tikhonov regularization on continuous spaces. The algorithms are very simple and involve solving a single, usually sparse, system of linear equations. Using the notion of algorithmic stability, we derive bounds on the generalization error and relate it to the structural invariants of the graph.

international acm sigir conference on research and development in information retrieval | 2006

High accuracy retrieval with multiple nested ranker

Irina Matveeva; Christopher J. C. Burges; Timo Burkard; Andy Laucius; Leon Wong

High precision at the top ranks has become a new focus of research in information retrieval. This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents. At each iteration, this approach uses the RankNet learning algorithm to re-rank a subset of the results. This splits the problem into smaller and easier tasks and generates a new distribution of the results to be learned by the algorithm. We evaluate this approach using different settings on a data set labeled with several degrees of relevance. We use the normalized discounted cumulative gain (NDCG) to measure the performance because it depends not only on the position but also on the relevance score of the document in the ranked list. Our experiments show that making the learning algorithm concentrate on the top scoring results improves precision at the top ten documents in terms of the NDCG score.

Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition | 2005

Using Morphology and Syntax Together in Unsupervised Learning

Yu Hu; Irina Matveeva; John Goldsmith; Colin Sprague

Unsupervised learning of grammar is a problem that can be important in many areas ranging from text preprocessing for information retrieval and classification to machine translation. We describe an MDL based grammar of a language that contains morphology and lexical categories. We use an unsupervised learner of morphology to bootstrap the acquisition of lexical categories and use these two learning processes iteratively to help and constrain each other. To be able to do so, we need to make our existing morphological analysis less fine grained. We present an algorithm for collapsing morphological classes (signatures) by using syntactic context. Our experiments demonstrate that this collapse preserves the relation between morphology and lexical categories within new signatures, and thereby minimizes the description length of the model.

language and technology conference | 2006

Document Representation and Multilevel Measures of Document Similarity

Irina Matveeva

We present our work on combining large-scale statistical approaches with local linguistic analysis and graph-based machine learning techniques to compute a combined measure of semantic similarity between terms and documents for application in information extraction, question answering, and summarisation.

PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition | 2005

The SED heuristic for morpheme discovery: a look at Swahili

Yu Hu; Irina Matveeva; John Goldsmith; Colin Sprague

This paper describes a heuristic for morpheme- and morphology-learning based on string edit distance. Experiments with a 7,000 word corpus of Swahili, a language with a rich morphology, support the effectiveness of this approach.

cross-language evaluation forum | 2004

University of chicago at CLEF2004: cross-language text and spoken document retrieval

Gina-Anne Levow; Irina Matveeva

The University of Chicago participated in the Cross-Language Evaluation Forum 2004 (CLEF2004) cross-language multilingual, bilingual, and spoken language tracks. Cross-language experiments focused on meeting the challenges of new languages with freely available resources. We found that modest effectiveness could be achieved with the additional application of pseudo-relevance feedback to overcome some gaps in impoverished lexical resources. Experiments with a new dimensionality reduction approach for re-ranking of retrieved results yielded no improvement, however. Finally, spoken document retrieval experiments aimed to meet the challenges of unknown story boundary conditions and noisy retrieval through query-based merger of fine-grained overlapping windows and pseudo-feedback query expansion to enhance retrieval.

workshop on graph based methods for natural language processing | 2006

Graph-based Generalized Latent Semantic Analysis for Document Representation

Irina Matveeva; Gina-Anne Levow

Document indexing and representation of term-document relations are very important for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based association measure within the Generalized Latent Semantic Analysis framework. We evaluate the graph-based GLSA on the document clustering task.

conference of the european chapter of the association for computational linguistics | 2006

Computing term translation probabilities with generalized latent semantic analysis

Irina Matveeva; Gina-Anne Levow

Term translation probabilities proved an effective method of semantic smoothing in the language modelling approach to information retrieval tasks. In this paper, we use Generalized Latent Semantic Analysis to compute semantically motivated term and document vectors. The normalized cosine similarity between the term vectors is used as term translation probability in the language modelling framework. Our experiments demonstrate that GLSA-based term translation probabilities capture semantic relations between terms and improve performance on document classification.

Explore More