Dmitri Roussinov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dmitri Roussinov is active.

Explore More

Publication

Featured researches published by Dmitri Roussinov.

decision support systems | 2003

Automatic discovery of similarity relationships through Web mining

Dmitri Roussinov; J. Leon Zhao

This work demonstrates how the World Wide Web can be mined in a fully automated manner for discovering the semantic similarity relationships among the concepts surfaced during an electronic brainstorming session, and thus improving the accuracy of automated clustering meeting messages. Our novel Context Sensitive Similarity Discovery (CSSD) method takes advantage of the meeting context when selecting a subset of Web pages for data mining, and then conducts regular concept co-occurrence analysis within that subset. Our results have implications on reducing information overload in applications of text technologies such as email filtering, document retrieval, text summarization, and knowledge management.

Information Processing and Management | 2001

Information navigation on the web by clustering and summarizing query results

Dmitri Roussinov; Hsinchun Chen

We report our experience with a novel approach to interactive information seeking that is grounded in the idea of summarizing query results through automated document clustering. We went through a complete system development and evaluation cycle: designing the algorithms and interface for our prototype, implementing them and testing with human users. Our prototype acted as an intermediate layer between the user and a commercial Internet search engine (AltaVista), thus allowing searches of the significant portion of World Wide Web. In our final evaluation, we processed data from 36 users and concluded that our prototype improved search performance over using the same search engine (AltaVista) directly. We also analyzed effects of various related demographic and task related parameters.

IEEE Transactions on Neural Networks | 2009

A Multitask Learning Model for Online Pattern Recognition

Seiichi Ozawa; Asim Roy; Dmitri Roussinov

This paper presents a new learning algorithm for multitask pattern recognition (MTPR) problems. We consider learning multiple multiclass classification tasks online where no information is ever provided about the task category of a training example. The algorithm thus needs an automated task recognition capability to properly learn the different classification tasks. The learning mode is ldquoonlinerdquo where training examples for different tasks are mixed in a random fashion and given sequentially one after another. We assume that the classification tasks are related to each other and that both the tasks and their training examples appear in random during ldquoonline training.rdquo Thus, the learning algorithm has to continually switch from learning one task to another whenever the training examples change to a different task. This also implies that the learning algorithm has to detect task changes automatically and utilize knowledge of previous tasks for learning new tasks fast. The performance of the algorithm is evaluated for ten MTPR problems using five University of California at Irvine (UCI) data sets. The experiments verify that the proposed algorithm can indeed acquire and accumulate task knowledge and that the transfer of knowledge from tasks already learned enhances the speed of knowledge acquisition on new tasks and the final classification accuracy. In addition, the task categorization accuracy is greatly improved for all MTPR problems by introducing the reorganization process even if the presentation order of class training examples is fairly biased.

Communications of The ACM | 2008

Beyond keywords: Automated question answering on the web

Dmitri Roussinov; Weiguo Fan; Jose Antonio Robles-Flores

Beyond Google, emerging question-answering systems respond to natural-language queries.

acm international conference on digital libraries | 1998

Information forage through adaptive visualization

Dmitri Roussinov; Marshall Ramsey

Marshall Ramsey Department of MIS University of Arizona, McClelland Hall 430 Tucson, AZ 85721 E-mail: [email protected] Automatically created maps of concepts improve navigation in a collection of text documents. We report our research on leveraging navigation by providing interactively the ability to modify the maps themselves. We believe that this functionality leads to better responsiveness to the user and a more effective search. For this purpose we have created and tested a prototype system that builds and refines in real-time a map of concepts found in Web documents returned by a commercial search engine.

hawaii international conference on system sciences | 2005

Automated Question Answering From Lecture Videos: NLP vs. Pattern Matching

Jinwei Cao; Jose Antonio Robles-Flores; Dmitri Roussinov; Jay F. Nunamaker

This paper explores the feasibility of automated question answering from lecture video materials used in conjunction with PowerPoint slides. Two popular approaches to question answering are discussed, each separately tested on the text extracted from videotaped lectures: 1) the approach based on Natural Language Processing (NLP) and 2) a self-learning probabilistic pattern matching approach. The results of the comparison and our qualitative observations are presented. The advantages and shortcomings of each approach are discussed in the context of video applications for e-learning or knowledge management.

text retrieval conference | 2005

Building on Redundancy: Factoid Question Answering, Robust Retrieval and the "Other"

Dmitri Roussinov; Elena Filatova; Michael Chau; Jose Antonio Robles-Flores

We have explored how redundancy based techniques can be used in improving factoid question answering, definitional questions (“other”), and robust retrieval. For the factoids, we explored the meta approach: we submit the questions to the several open domain question answering systems available on the Web and applied our redundancy-based triangulation algorithm to analyze their outputs in order to identify the most promising answers. Our results support the added value of the meta approach: the performance of the combined system surpassed the underlying performances of its components. To answer definitional (“other”) questions, we were looking for the sentences containing re-occurring pairs of noun entities containing the elements of the target. For robust retrieval, we applied our redundancy based Internet mining technique to identify the concepts (single word terms or phrases) that were highly related to the topic (query) and expanded the queries with them. All our results are above the mean performance in the categories in which we have participated, with one of our robust runs being the best in its category among all 24 participants. Overall, our findings support the hypothesis that using as much as possible textual data, specifically such as mined from the World Wide Web, is extre mely promising. FACTOID QUESTION ANSWERING The Natural Language Processing (NLP) task, which is behind Question Answering (QA) technology, is known to be Artificial Intelligence (AI) complete: it requires the computers to be as intelligent as people, to understand the deep semantics of human communication, and to be capable of common sense reasoning. As a result, different systems have different capabilities. They vary in the range of tasks that they support, the types of questions they can handle, and the ways in which they present the answers. By following the example of meta search engines on the Web (Selberg & Etzioni, 1995), we advocate combining several fact seeking engines into a single “Meta” approach. Meta search engines (sometimes called metacrawlers) can take a query consisting of keywords (e.g. “Rotary engines”), send them to several portals (e.g. Google, MSN, etc.), and then combine the results. This allows them to provide better coverage and specialization. The examples are MetaCrawler (Selberg & Etzioni, 1995), 37.com (www.37.com), and Dogpile (www.dogpile.com). Although, the keyword based meta search engines have been suggested and explored in the past, we are not aware of the similar approach tried for the task of open domain/corpus question answering (fact seeking). The practical benefits of the meta approach are justified by general consideration: eliminating “weakest link” dependency. It does not rely on a single system which may fail or may simply not be designed for a specific type of tasks (questions). The meta approach promises higher coverage and recall of the correct answers since different QA engines may cover different databases or different parts of the Web. In addition, the meta approach can reduce subjectivity by querying several engines; like in the real-world, one can gather the views from several people in order to make the answers more accurate and objective. The speed provided by several systems queried in parallel can also significantly exceed those obtained by working with only one system, since their responsiveness may vary with the task and network traffic conditions. In addition, the meta approach fits nicely into a becoming-popular Web services model, where each service (QA engine) is independently developed and maintained and the meta engine integrates them together, while still being organizationally independent from them. Since each engine may be provided by a commercial company interested in increasing their advertising revenue or a research group showcasing their cutting edge technology, the competition mechanism will also ensure quality and diversity among the services. Finally, a meta engine can be customized for a particular portal such as those supporting business intelligence, education, serving visually impaired or mobile phone users. Figure 1. Example of START output. Figure 2. Example of Btainboost output. Meta Approach Defined We define a fact seeking meta engine as the system that can combine, analyze, and represent the answers that are obtained from several underlying systems (called answer services throughout our paper). At least some of these underlying services (systems ) have to be capable of providing candidate answers to some types of questions asked in a natural language form, otherwise the overall architecture would not be any different from a single fact seeking engine which are typically based on a commercial keyword search engines, e.g. Google. The technology behind each of the answer services can be as complex as deep semantic NLP or as simple as shallow pattern matching. Fact Seeking Service Web address Output Format Organization/System Performance in our evaluation (MRR) START start.csail.mit.edu Single answer sentence Research Prototype 0.049** AskJeeves www.ask.com Up to 200 ordered snippets Commercial 0.397** BrainBoost www.brainboost.com Up to 4 snippets Commercial 0.409* ASU QA on the Web qa.wpcarey.asu.edu Up to 20 ordered sentences Research Prototype 0.337** Wikipedia en.wikipedia.org Narrative Non profit 0.194** ASU Meta QA http://qa.wpcarey.asu.edu/ Precise answer Research Prototype 0.435 Table 1. The fact seeking services involved, their characteristics and performances in the evaluation on the 2004 questions. * and ** indicate 0.1 and .05 levels of statistical significance of the difference from the best accordingly. Challenges Faced and Addressed Combing multiple fact seeking engines also faces several challenges. First, the output formats of them may differ : some engines produce exact answer (e.g. START), some other present one sentence or an entire snippet (several sentences) simi lar to web search engines, as shown in Figures 1-4. Table 1 summarizes those differences and other capabilities for the popular fact seeking engines. Second, the accuracy of responses may differ overall and have even higher variability depending on a specific type of a question. And finally, we have to deal with multiple answers, thus removing duplicates, and resolving answer variations is necessary. The issues with merging search results from multiple engines have been already explored by MetaCrawler (Selberg & Etzioni, 1995) and fusion studies in information retrieval (e.g. Vogt & Cottrell, 1999) but only in the context or merging lists of retrieved text documents. We argue that the task of fusing multiple short answers, which may potentially conflict or confirm each other, is fundamentally different and poses a new challenge for the researchers. For example, some answer services (components) may be very precise (e.g. START), but cover only a small proportion of questions. They need to be backed up by less precise services that have higher coverage (e.g. AskJeeves). However, backing up may easily result in diluting the answer set by spurious (wrong) answers. Thus, there is a need for some kind of triangulation of the candidate answers provided by the different services or multiple candidate answers provided by the same service. Figure 3. Example of Ask Jeeves output. Figure 4. Example of ASU QA output. Triangulation, a term which is widely used in intelligence and journalism, stands for confirming or disconfirming facts, by using multiple sources. Roussinov et al. (2004) went one step further than using the frequency counts explored earlier by Dumais et al. (2002) and groups involved in TREC competitions. They explored a more fine-grained triangulation process which we also used in our prototype. Their algorithm can be demonstrated by the following intuitive example. Imagine that we have two candidate answers for the question “What was the purpose of the Manhattan Project?”: 1) “To develop a nuclear bomb” 2) “To create an atomic weapon”. These two answers support (triangulate) each other since they are semantically similar. However, a straightforward frequency count approach would not pick this similarity. The advantage of triangulation over simple frequency counting is that it is more powerful for less “factual” questions, such as those that may allow variations in the correct answers. In order to enjoy the full power of triangulation with factoid questions (e.g. Who is the CEO of IBM?), the candidate answers have to be extracted from their sentences (e.g. Samuel Palmisano), so they can be more accurately compared with the other candidate answers (e.g. Sam Palmisano). That is why the meta engine needs to possess answer understanding capabilities as well, including such crucial capability as question interpretation and semantic verification of the candidate answers to check that they belong to a desired category (person in the example above). Figure 5. The Meta approach to fact seeking. Fact Seeking Engine Meta Prototype: Underlying Technologies and Architecture In the first version of our prototype, we included several freely available demonstrational prototypes and popular commercial engines on the Web that have some QA (fact seeking) capabilities, specifically START, AskJeeves, BrainBoost and ASU QA (Table 1, Figures 1-4). We also added Wikipedia to the list. Although it does not have QA capabilities, it provides good quality factual information on a variety of topics, which adds power to our triangulation mechanism. Google was not used directly as a service but BrainBoost and ASU QA are already using it among the other major keyword search engines. The meta-search part of our system was based on the MetaSpider architecture (Chau et al., 2001; Chen et al., 2001). Multithreads are launched to submit the query to fetch the candidate answers from each service. After these results are obtained, the system performs answer extraction, triangulation and semantic verifi

hawaii international conference on system sciences | 2000

Information navigation by clustering and summarizing query results

Dmitri Roussinov; Michael J. McQuaid

We have explored and evaluated a novel approach to information seeking grounded in the idea of summarizing query results through automated document clustering. The user starts with a natural language description of the needed information and navigates the information space through the interaction of the system. We implemented a prototype allowing searches of a significant portion of the entire World Wide Web. In a laboratory experiment, subjects searched the WWW for answers to a given set of questions. Our results indicate that our prototype improved search peformance, presumably through better understanding of query results. In addition, we analyzed interaction patterns and the effects of such parameters as subject skills and task peculiarities.

empirical methods in natural language processing | 2005

Mining Context Specific Similarity Relationships Using The World Wide Web

Dmitri Roussinov; Leon Zhao; Weiguo Fan

We have studied how context specific web corpus can be automatically created and mined for discovering semantic similarity relationships between terms (words or phrases) from a given collection of documents (target collection). These relationships between terms can be used to adjust the standard vectors space representation so as to improve the accuracy of similarity computation between text documents in the target collection. Our experiments with a standard test collection (Reuters) have revealed the reduction of similarity errors by up to 50%, twice as much as the improvement by using other known techniques.

Journal of Enterprise Information Management | 2004

Text clustering and summary techniques for CRM message management

Dmitri Roussinov; J. Leon Zhao

One of customer relationship management (CRM) activities involves soliciting customer feedback on product and service quality and the resolution of customer complaints. Inevitably, companies must deal with large number of CRM messages from their customers either through e‐mails or from work logs. Going through those messages is an important but tedious task for managers or CRM specialists in order to make strategic plans on where to place the resources to achieve better CRM results. In this paper, we present a methodology for making sense out of CRM messages based on text clustering and summary techniques. The unique features of CRM messages are the short message length and frequent availability of correlated CRM ratings. We propose several novel techniques including organizational concept space, Web mining of similarity relationships between concepts, and correlated analysis of text and ratings. We have tested the basic concepts and techniques of CRM Sense Maker in a business setting where customer surveys are used to set strategic directions in customer services.

Explore More