Jose Antonio Robles-Flores

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jose Antonio Robles-Flores is active.

Explore More

Publication

Featured researches published by Jose Antonio Robles-Flores.

Communications of The ACM | 2008

Beyond keywords: Automated question answering on the web

Dmitri Roussinov; Weiguo Fan; Jose Antonio Robles-Flores

Beyond Google, emerging question-answering systems respond to natural-language queries.

hawaii international conference on system sciences | 2005

Automated Question Answering From Lecture Videos: NLP vs. Pattern Matching

Jinwei Cao; Jose Antonio Robles-Flores; Dmitri Roussinov; Jay F. Nunamaker

This paper explores the feasibility of automated question answering from lecture video materials used in conjunction with PowerPoint slides. Two popular approaches to question answering are discussed, each separately tested on the text extracted from videotaped lectures: 1) the approach based on Natural Language Processing (NLP) and 2) a self-learning probabilistic pattern matching approach. The results of the comparison and our qualitative observations are presented. The advantages and shortcomings of each approach are discussed in the context of video applications for e-learning or knowledge management.

text retrieval conference | 2005

Building on Redundancy: Factoid Question Answering, Robust Retrieval and the "Other"

Dmitri Roussinov; Elena Filatova; Michael Chau; Jose Antonio Robles-Flores

We have explored how redundancy based techniques can be used in improving factoid question answering, definitional questions (“other”), and robust retrieval. For the factoids, we explored the meta approach: we submit the questions to the several open domain question answering systems available on the Web and applied our redundancy-based triangulation algorithm to analyze their outputs in order to identify the most promising answers. Our results support the added value of the meta approach: the performance of the combined system surpassed the underlying performances of its components. To answer definitional (“other”) questions, we were looking for the sentences containing re-occurring pairs of noun entities containing the elements of the target. For robust retrieval, we applied our redundancy based Internet mining technique to identify the concepts (single word terms or phrases) that were highly related to the topic (query) and expanded the queries with them. All our results are above the mean performance in the categories in which we have participated, with one of our robust runs being the best in its category among all 24 participants. Overall, our findings support the hypothesis that using as much as possible textual data, specifically such as mined from the World Wide Web, is extre mely promising. FACTOID QUESTION ANSWERING The Natural Language Processing (NLP) task, which is behind Question Answering (QA) technology, is known to be Artificial Intelligence (AI) complete: it requires the computers to be as intelligent as people, to understand the deep semantics of human communication, and to be capable of common sense reasoning. As a result, different systems have different capabilities. They vary in the range of tasks that they support, the types of questions they can handle, and the ways in which they present the answers. By following the example of meta search engines on the Web (Selberg & Etzioni, 1995), we advocate combining several fact seeking engines into a single “Meta” approach. Meta search engines (sometimes called metacrawlers) can take a query consisting of keywords (e.g. “Rotary engines”), send them to several portals (e.g. Google, MSN, etc.), and then combine the results. This allows them to provide better coverage and specialization. The examples are MetaCrawler (Selberg & Etzioni, 1995), 37.com (www.37.com), and Dogpile (www.dogpile.com). Although, the keyword based meta search engines have been suggested and explored in the past, we are not aware of the similar approach tried for the task of open domain/corpus question answering (fact seeking). The practical benefits of the meta approach are justified by general consideration: eliminating “weakest link” dependency. It does not rely on a single system which may fail or may simply not be designed for a specific type of tasks (questions). The meta approach promises higher coverage and recall of the correct answers since different QA engines may cover different databases or different parts of the Web. In addition, the meta approach can reduce subjectivity by querying several engines; like in the real-world, one can gather the views from several people in order to make the answers more accurate and objective. The speed provided by several systems queried in parallel can also significantly exceed those obtained by working with only one system, since their responsiveness may vary with the task and network traffic conditions. In addition, the meta approach fits nicely into a becoming-popular Web services model, where each service (QA engine) is independently developed and maintained and the meta engine integrates them together, while still being organizationally independent from them. Since each engine may be provided by a commercial company interested in increasing their advertising revenue or a research group showcasing their cutting edge technology, the competition mechanism will also ensure quality and diversity among the services. Finally, a meta engine can be customized for a particular portal such as those supporting business intelligence, education, serving visually impaired or mobile phone users. Figure 1. Example of START output. Figure 2. Example of Btainboost output. Meta Approach Defined We define a fact seeking meta engine as the system that can combine, analyze, and represent the answers that are obtained from several underlying systems (called answer services throughout our paper). At least some of these underlying services (systems ) have to be capable of providing candidate answers to some types of questions asked in a natural language form, otherwise the overall architecture would not be any different from a single fact seeking engine which are typically based on a commercial keyword search engines, e.g. Google. The technology behind each of the answer services can be as complex as deep semantic NLP or as simple as shallow pattern matching. Fact Seeking Service Web address Output Format Organization/System Performance in our evaluation (MRR) START start.csail.mit.edu Single answer sentence Research Prototype 0.049** AskJeeves www.ask.com Up to 200 ordered snippets Commercial 0.397** BrainBoost www.brainboost.com Up to 4 snippets Commercial 0.409* ASU QA on the Web qa.wpcarey.asu.edu Up to 20 ordered sentences Research Prototype 0.337** Wikipedia en.wikipedia.org Narrative Non profit 0.194** ASU Meta QA http://qa.wpcarey.asu.edu/ Precise answer Research Prototype 0.435 Table 1. The fact seeking services involved, their characteristics and performances in the evaluation on the 2004 questions. * and ** indicate 0.1 and .05 levels of statistical significance of the difference from the best accordingly. Challenges Faced and Addressed Combing multiple fact seeking engines also faces several challenges. First, the output formats of them may differ : some engines produce exact answer (e.g. START), some other present one sentence or an entire snippet (several sentences) simi lar to web search engines, as shown in Figures 1-4. Table 1 summarizes those differences and other capabilities for the popular fact seeking engines. Second, the accuracy of responses may differ overall and have even higher variability depending on a specific type of a question. And finally, we have to deal with multiple answers, thus removing duplicates, and resolving answer variations is necessary. The issues with merging search results from multiple engines have been already explored by MetaCrawler (Selberg & Etzioni, 1995) and fusion studies in information retrieval (e.g. Vogt & Cottrell, 1999) but only in the context or merging lists of retrieved text documents. We argue that the task of fusing multiple short answers, which may potentially conflict or confirm each other, is fundamentally different and poses a new challenge for the researchers. For example, some answer services (components) may be very precise (e.g. START), but cover only a small proportion of questions. They need to be backed up by less precise services that have higher coverage (e.g. AskJeeves). However, backing up may easily result in diluting the answer set by spurious (wrong) answers. Thus, there is a need for some kind of triangulation of the candidate answers provided by the different services or multiple candidate answers provided by the same service. Figure 3. Example of Ask Jeeves output. Figure 4. Example of ASU QA output. Triangulation, a term which is widely used in intelligence and journalism, stands for confirming or disconfirming facts, by using multiple sources. Roussinov et al. (2004) went one step further than using the frequency counts explored earlier by Dumais et al. (2002) and groups involved in TREC competitions. They explored a more fine-grained triangulation process which we also used in our prototype. Their algorithm can be demonstrated by the following intuitive example. Imagine that we have two candidate answers for the question “What was the purpose of the Manhattan Project?”: 1) “To develop a nuclear bomb” 2) “To create an atomic weapon”. These two answers support (triangulate) each other since they are semantically similar. However, a straightforward frequency count approach would not pick this similarity. The advantage of triangulation over simple frequency counting is that it is more powerful for less “factual” questions, such as those that may allow variations in the correct answers. In order to enjoy the full power of triangulation with factoid questions (e.g. Who is the CEO of IBM?), the candidate answers have to be extracted from their sentences (e.g. Samuel Palmisano), so they can be more accurately compared with the other candidate answers (e.g. Sam Palmisano). That is why the meta engine needs to possess answer understanding capabilities as well, including such crucial capability as question interpretation and semantic verification of the candidate answers to check that they belong to a desired category (person in the example above). Figure 5. The Meta approach to fact seeking. Fact Seeking Engine Meta Prototype: Underlying Technologies and Architecture In the first version of our prototype, we included several freely available demonstrational prototypes and popular commercial engines on the Web that have some QA (fact seeking) capabilities, specifically START, AskJeeves, BrainBoost and ASU QA (Table 1, Figures 1-4). We also added Wikipedia to the list. Although it does not have QA capabilities, it provides good quality factual information on a variety of topics, which adds power to our triangulation mechanism. Google was not used directly as a service but BrainBoost and ASU QA are already using it among the other major keyword search engines. The meta-search part of our system was based on the MetaSpider architecture (Chau et al., 2001; Chen et al., 2001). Multithreads are launched to submit the query to fetch the candidate answers from each service. After these results are obtained, the system performs answer extraction, triangulation and semantic verifi

Journal of Internet and Enterprise Management | 2005

Web question answering: technology and applications to business intelligence

Dmitri Roussinov; Jose Antonio Robles-Flores

We introduce a novel and completely trainable approach to automated open domain Question Answering (QA) on the web, for the purpose of business intelligence. Our approach does not involve any linguistic resources and can be easily implemented within any information awareness system. When tested on standard test collections, the performance of our approach was found to be comparable with that of the current top-of-the-line, complex and expensive linguistic approaches, and significantly better than that of other completely trainable approaches. We also present the design of our ongoing empirical study and the qualitative observations from our pilot experiments.

Communications of The Ais | 2012

Examining question-answering technology from the task technology fit perspective

Jose Antonio Robles-Flores; Dmitri Roussinov

The World Wide Web has become a vital supplier of information for organizations in order to carry on such tasks as business intelligence, security monitoring, and risk assessments. By utilizing the task-technology fit (TTF) theory, we investigate the issue of when open-domain question-answering (QA) technology would potentially be superior to general-purpose Web search engines. Specifically, we argue theoretically and back up our arguments with a user study that the presence of fusion (information synthesis) is crucial to warrant the use of QA. At the same time, many information seeking tasks do not require fusion and, thus, are adequately served by traditional keyword search portals (Google, MSN, Yahoo, etc.). This explains why prior attempts to demonstrate the value of QA empirically were unsuccessful. We also discuss methodological challenges to any empirical investigation of QA and present several solutions to those challenges, validated with our user study. In order to carry our study, we created a novel prototype by following the Design Science guidelines. Our prototype is the first of its kind and is capable of answering list questions, such as What companies own low orbit satellites? or In which cities have illegal methyl-methionine labs been found? This investigation is only a precursor to a full-scale empirical study, but it serves as a medium to overview the state of the art QA technologies and to introduce important theoretical and empirical concepts involved. Although we did not find empirical evidence that one technology is uniformly better than the other, we discovered that once the user accumulates experience using QA, he/she can make an intelligent decision whether to use it for a particular task, which leads to the user to be more productive on average with the same tasks compared to when there is no choice of technology.

International Journal of Business Intelligence Research | 2011

Strategies for Improving the Efficacy of Fusion Question Answering Systems

Jose Antonio Robles-Flores; Gregory Schymik; Julie Smith-David; Robert D. St. Louis

Web search engines typically retrieve a large number of web pages and overload business analysts with irrelevant information. One approach that has been proposed for overcoming some of these problems is automated Question Answering (QA). This paper describes a case study that was designed to determine the efficacy of QA systems for generating answers to original, fusion, list questions (questions that have not previously been asked and answered, questions for which the answer cannot be found on a single web site, and questions for which the answer is a list of items). Results indicate that QA algorithms are not very good at producing complete answer lists and that searchers are not very good at constructing answer lists from snippets. These findings indicate a need for QA research to focus on crowd sourcing answer lists and improving output format.

text retrieval conference | 2004