Praveen Pathak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Praveen Pathak is active.

Explore More

Publication

Featured researches published by Praveen Pathak.

Journal of the Association for Information Science and Technology | 2004

The effects of fitness functions on genetic programming-based ranking discovery for Web search

Weiguo Fan; Edward A. Fox; Praveen Pathak; Harris Wu

Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR task-discovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is well known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs on GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations on the design of fitness functions for genetic-based information retrieval experiments.

IEEE Transactions on Knowledge and Data Engineering | 2004

Discovery of context-specific ranking functions for effective information retrieval using genetic programming

Weiguo Fan; Michael D. Gordon; Praveen Pathak

The Internet and corporate intranets have brought a lot of information. People usually resort to search engines to find required information. However, these systems tend to use only one fixed ranking strategy regardless of the contexts. This poses serious performance problems when characteristics of different users, queries, and text collections are taken into account. We argue that the ranking strategy should be context specific and we propose a , new systematic method that can automatically generate ranking strategies for different contexts based on genetic programming (GP). The new method was tested on TREC data and the results are very promising.

Information Systems Research | 2011

The Impact of Automation of Systems on Medical Errors: Evidence from Field Research

Ravi Aron; Shantanu Dutta; Ramkumar Janakiraman; Praveen Pathak

We use panel data from multiple wards from two hospitals spanning a three-year period to investigate the impact of automation of the core error prevention functions in hospitals on medical error rates. Although there are studies based on anecdotal evidence and self-reported data on how automation impacts medical errors, no systematic studies exist that are based on actual error rates from hospitals. Further, there is no systematic evidence on how incremental automation over time and across multiple wards impacts the rate of medical errors. The primary objective of our study is to fill this gap in the literature by empirically examining how the automation of core error prevention functions affects two types of medical errors. We draw on the medical informatics literature and principal-agency theory and use a unique panel data set of actual documented medical errors from two major hospitals to analyze the interplay between automation and medical errors. We hypothesize that the automation of the sensing function (recording and observing agent actions) will have the greatest impact on reducing error rates. We show that there are significant complementarities between quality management training imparted to hospital staff and the automation of control systems in reducing interpretative medical errors. We also offer insights to practitioners and theoreticians alike on how the automation of error prevention functions can be combined with training in quality management to yield better outcomes. Our results suggest an optimal implementation path for the automation of error prevention functions in hospitals.

Management Science | 2010

Detecting Management Fraud in Public Companies

Mark Cecchini; Haldun Aytug; Gary J. Koehler; Praveen Pathak

This paper provides a methodology for detecting management fraud using basic financial data. The methodology is based on support vector machines. An important aspect therein is a kernel that increases the power of the learning machine by allowing an implicit and generally nonlinear mapping of points, usually into a higher dimensional feature space. A kernel specific to the domain of finance is developed. This financial kernel constructs features shown in prior research to be helpful in detecting management fraud. A large empirical data set was collected, which included quantitative financial attributes for fraudulent and nonfraudulent public companies. Support vector machines using the financial kernel correctly labeled 80% of the fraudulent cases and 90.6% of the nonfraudulent cases on a holdout set. Furthermore, we replicate other leading fraud research studies using our data and find that our method has the highest accuracy on fraudulent cases and competitive accuracy on nonfraudulent cases. The results validate the financial kernel together with support vector machines as a useful method for discriminating between fraudulent and nonfraudulent companies using only publicly available quantitative financial attributes. The results also show that the methodology has predictive value because, using only historical data, it was able to distinguish fraudulent from nonfraudulent companies in subsequent years.

decision support systems | 2010

Making words work: Using financial text as a predictor of financial events

Mark Cecchini; Haldun Aytug; Gary J. Koehler; Praveen Pathak

We develop a methodology for automatically analyzing text to aid in discriminating firms that encounter catastrophic financial events. The dictionaries we create from Management Discussion and Analysis Sections (MD&A) of 10-Ks discriminate fraudulent from non-fraudulent firms 75% of the time and bankrupt from nonbankrupt firms 80% of the time. Our results compare favorably with quantitative prediction methods. We further test for complementarities by merging quantitative data with text data. We achieve our best prediction results for both bankruptcy (83.87%) and fraud (81.97%) with the combined data, showing that that the text of the MD&A complements the quantitative financial information.

hawaii international conference on system sciences | 2004

Ranking function optimization for effective Web search by genetic programming: an empirical study

Weiguo Fan; Michael D. Gordon; Praveen Pathak; Wensi Xi; Edward A. Fox

Web search engines have become indispensable in our daily life to help us find the information we need. Although search engines are very fast in search response time, their effectiveness in finding useful and relevant documents at the top of the search hit list needs to be improved. In this paper, we report our experience applying genetic programming (GP) to the ranking function discovery problem leveraging the structural information of HTML documents. Our empirical experiments using the Web track data from recent TREC conferences show that we can discover better ranking functions than existing well-known ranking strategies from IR, such as Okapi, Ptfidf. The performance is even comparable to those obtained by support vector machine.

decision support systems | 2006

On linear mixture of expert approaches to information retrieval

Weiguo Fan; Michael D. Gordon; Praveen Pathak

Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document collections, it has become increasingly difficult to retrieve relevant information from these large document collections. Information Retrieval systems help users identify relevant documents for their information needs. Matching functions match the information in documents with that required by users in terms of queries to produce a set of documents to be presented to the users. It is well known that a single matching function does not produce the best retrieval results for all contexts (documents and queries). In this paper we combine the results obtained from well known matching functions in the literature. We employ Genetic Algorithms to do such combinations and test our method using a large well known document dataset. It is observed that our method produces better retrieval results for both the consensus search and the routing tasks in information retrieval.

decision support systems | 2006

Nonlinear ranking function representations in genetic programming-based ranking discovery for personalized search

Weiguo Fan; Praveen Pathak; Linda G. Wallace

Ranking function is instrumental in affecting the performance of a search engine. Designing and optimizing a search engines ranking function remains a daunting task for computer and information scientists. Recently, genetic programming (GP), a machine learning technique based on evolutionary theory, has shown promise in tackling this very difficult problem. Ranking functions discovered by GP have been found to be significantly better than many of the other existing ranking functions. However, current GP implementations for ranking function discovery are all designed utilizing the Vector Space model in which the same term weighting strategy is applied to all terms in a document. This may not be an ideal representation scheme at the individual query level considering the fact that many query terms should play different roles in the final ranking. In this paper, we propose a novel nonlinear ranking function representation scheme and compare this new design to the well-known Vector Space model. We theoretically show that the new representation scheme subsumes the traditional Vector Space model representation scheme as a special case and hence allows for additional flexibility in term weighting. We test the new representation scheme with the GP-based discovery framework in a personalized search (information routing) context using a TREC web corpus. The experimental results show that the new ranking function representation design outperforms the traditional Vector Space model for GP-based ranking function discovery.

decision support systems | 2006

An integrated two-stage model for intelligent information routing

Weiguo Fan; Michael D. Gordon; Praveen Pathak

A recent surge of subscriptions to online news services exemplifies the fact that people and organizations constantly need up-to-date information to stay competitive and make better informed decisions. However, many of these news services often require users to either manually input their profiles or subscribe to existing news channels. This results in lack of intelligence and personalization, and thus make these services less attractive to users. In this paper, an integrated model that combines query expansion with ranking function adaptation for online information routing is proposed and tested using two different large scale corpora. The experimental results show that this new model can deliver much better quality information than existing models.

decision support systems | 2009

Genetic-based approaches in ranking function discovery and optimization in information retrieval - A framework

Weiguo Fan; Praveen Pathak; Mi Zhou

An Information Retrieval (IR) system consists of document collection, queries issued by users, and the matching/ranking functions used to rank documents in the predicted order of relevance for a given query. A variety of ranking functions have been used in the literature. But studies show that these functions do not perform consistently well across different contexts. In this paper we propose a two-stage integrated framework for discovering and optimizing ranking functions used in IR. The first stage, discovery process, is accomplished by intelligently leveraging the structural and statistical information available in HTML documents by using Genetic Programming techniques to yield novel ranking functions. In the second stage, the optimization process, document retrieval scores of various well-known ranking functions are combined using Genetic Algorithms. The overall discovery and optimization framework is tested on the well-known TREC collection of web documents for both the ad-hoc retrieval task and the routing task. Utilizing our framework we observe a significant increase in retrieval performance compared to some of the well-known stand alone ranking functions.

Explore More