Is this you? Create Your Porfile

Boris Katz

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Boris Katz is active.

Explore More

Publication

Featured researches published by Boris Katz.

international acm sigir conference on research and development in information retrieval | 2003

Quantitative evaluation of passage retrieval algorithms for question answering

Stefanie Tellex; Boris Katz; Jimmy J. Lin; Aaron Fernandes; Gregory Marton

Passage retrieval is an important component common to many question answering systems. Because most evaluations of question answering systems focus on end-to-end performance, comparison of common components becomes difficult. To address this shortcoming, we present a quantitative evaluation of various passage retrieval algorithms for question answering, implemented in a framework called Pauchok. We present three important findings: Boolean querying schemes perform well in the question answering task. The performance differences between various passage retrieval algorithms vary with the choice of document retriever, which suggests significant interactions between document retrieval and passage retrieval. The best algorithms in our evaluation employ density-based measures for scoring query terms. Our results reveal future directions for passage retrieval and question answering.

international conference on computational linguistics | 1988

Exploiting lexical regularities in designing natural language systems

Boris Katz; Beth Levin

This paper presents the lexical component of the START Question Answering system developed at the MIT Artificial Intelligence Laboratory. START is able to interpret correctly a wide range of semantic relationships associated with a alternate expressions of the arguments of verbs. The design of the system takes advantage of the results of recent linguistic research into the structure of the lexicon, allowing START to attain a broader range of coverage than many existing systems while maintaining modular organization.

conference on information and knowledge management | 2003

Question answering from the web using knowledge annotation and knowledge mining techniques

Jimmy J. Lin; Boris Katz

We present a strategy for answering fact-based natural language questions that is guided by a characterization of real-world user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Knowledge annotation is an approach to answering large classes of frequently occurring questions by utilizing semi\-structured and structured Web sources. Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges. We have integrated these two different paradigms into a question answering system capable of providing users with concise answers that directly address their information needs.

recent advances in natural language processing | 2000

REXTOR: A System for Generating Relations from Natural Language

Boris Katz; Jimmy J. Lin

This paper argues that a finite-state language model with a ternary expression representation is currently the most practical and suitable bridge between natural language processing and information retrieval. Despite the theoretical computational inadequacies of finite-state grammars, they are very cost effective (in time and space requirements) and adequate for practical purposes. The ternary expressions that we use are not only linguistically-motivated, but also amenable to rapid large-scale indexing. REXTOR (Relations EXtracTOR) is an implementation of this model; in one uniform framework, the system provides two separate grammars for extracting arbitrary patterns of text and building ternary expressions from them. These content representational structures serve as the input to our ternary expressions indexer. This approach to natural language information retrieval promises to significantly raise the performance of current systems.

meeting of the association for computational linguistics | 2003

Extracting Structural Paraphrases from Aligned Monolingual Corpora

Ali Ibrahim; Boris Katz; Jimmy J. Lin

We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be much longer on average, and are capable of capturing long-distance dependencies. In addition to a standalone evaluation of our paraphrases, we also describe a question answering application currently under development that could immensely benefit from automatically-learned structural paraphrases.

human factors in computing systems | 2003

The role of context in question answering systems

Jimmy J. Lin; Dennis Quan; Vineet Sinha; Karun Bakshi; David F. Huynh; Boris Katz; David R. Karger

Despite recent advances in natural language question an-swering technology, the problem of designing effective user interfaces has been largely unexplored. We conducted a user study to investigate the problem and discovered that overall, users prefer a paragraph-sized chunk of text over just an exact phrase as the answer to their questions. Fur-thermore, users generally prefer answers embedded in con-text, regardless of the perceived reliability of the source documents. When users research a topic, increasing the amount of text returned to users significantly decreases the number of queries that they pose to the system, suggesting that users utilize supporting text to answer related ques-tions. We believe that these results can serve to guide future developments in question answering user interfaces.

international joint conference on natural language processing | 2005

A comparative study of language models for book and author recognition

Özlem Uzuner; Boris Katz

Linguistic information can help improve evaluation of similarity between documents; however, the kind of linguistic information to be used depends on the task. In this paper, we show that distributions of syntactic structures capture the way works are written and accurately identify individual books more than 76% of the time. In comparison, baseline features, e.g., tfidf-weighted keywords, function words, etc., give an accuracy of at most 66%. However, testing the same features on authorship attribution shows that distributions of syntactic structures are less successful than function words on this task; syntactic structures vary even among the works of the same author whereas features such as function words are distributed more similarly among the works of an author and can more effectively capture authorship.

Proceedings of the Second Workshop on Building Educational Applications Using NLP | 2005

Using Syntactic Information to Identify Plagiarism

Özlem Uzuner; Boris Katz; Thade Nahnsen

Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it difficult to recognize plagiarism; overlap in ambiguous keywords can falsely inflate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in superficial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identification of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidf-weighted keywords.

cooperative information systems | 2002

Natural Language Annotations for the Semantic Web

Boris Katz; Jimmy J. Lin; Dennis Quan

Because the ultimate purpose of the Semantic Web is to help users locate, organize, and process information, we strongly believe that it should be grounded in the information access method humans are most comfortable with--natural language. However, the Resource Description Framework (RDF), the foundation of the Semantic Web, was designed to be easily processed by computers, not humans. To render RDF friendlier to humans, we propose to augment it with natural language annotations, or metadata written in everyday language. We argue that natural language annotations are not only intuitive and effective, but can also accelerate the pace with which the Semantic Web is being adopted. We demonstrate the use of natural language annotations from within Haystack, an end user Semantic Web platform that also serves as a testbed for our ideas. In addition to a prototype Semantic Web question answering system, we describe other opportunities for marrying natural language and Semantic Web technology.

international conference on computational linguistics | 2002

Annotating the semantic web using natural language

Boris Katz; Jimmy J. Lin

Because the ultimate purpose of the Semantic Web is to help users better locate, organize, and process content, we believe that it should be grounded in the information access method humans are most comfortable with---natural language. However, the Resource Description Framework (RDF), the foundation of the Semantic Web, was designed to be easily processed by computers, not humans. To render RDF more friendly to humans, we propose to augment it with natural language annotations, or metadata written in everyday language. We argue that natural language annotations, parsed into computer-readable representations, are not only intuitive and effective, but can also accelerate the pace with which the Semantic Web is being adopted. We believe that our technology can facilitate a happy marriage between natural language technology and the Semantic Web vision.

Explore More