Ramanathan Narayanan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramanathan Narayanan is active.

Explore More

Publication

Featured researches published by Ramanathan Narayanan.

ieee international symposium on workload characterization | 2006

MineBench: A Benchmark Suite for Data Mining Workloads

Ramanathan Narayanan; Berkin Özisikyilmaz; Joseph Zambreno; Gokhan Memik; Alok N. Choudhary

Data mining constitutes an important class of scientific and commercial applications. Recent advances in data extraction techniques have created vast data sets, which require increasingly complex data mining algorithms to sift through them to generate meaningful information. The disproportionately slower rate of growth of computer systems has led to a sizeable performance gap between data mining systems and algorithms. The first step in closing this gap is to analyze these algorithms and understand their bottlenecks. With this knowledge, current computer architectures can be optimized for data mining applications. In this paper, we present MineBench, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining. We believe that MineBench will be of use to those looking to characterize and accelerate data mining workloads

international conference on data mining | 2011

Twitter Trending Topic Classification

Kathy Lee; Diana Palsetia; Ramanathan Narayanan; Md. Mostofa Ali Patwary; Ankit Agrawal; Alok N. Choudhary

With the increasing popularity of microblogging sites, we are in the era of information explosion. As of June 2011, about 200 million tweets are being generated everyday. Although Twitter provides a list of most popular topics people tweet about known as Trending Topics in real time, it is often hard to understand what these trending topics are about. Therefore, it is important and necessary to classify these topics into general categories with high accuracy for better information retrieval. To address this problem, we classify Twitter Trending Topics into 18 general categories such as sports, politics, technology, etc. We experiment with 2 approaches for topic classification, (i) the well-known Bag-of-Words approach for text classification and (ii) network-based classification. In text-based classification method, we construct word vectors with trending topic definition and tweets, and the commonly used tf-idf weights are used to classify the topics using a Naive Bayes Multinomial classifier. In network-based classification method, we identify top 5 similar topics for a given topic based on the number of common influential users. The categories of the similar topics and the number of common influential users between the given topic and its similar topics are used to classify the given topic using a C5.0 decision tree learner. Experiments on a database of randomly selected 768 trending topics (over 18 classes) show that classification accuracy of up to 65% and 70% can be achieved using text-based and network-based classification modeling respectively.

design, automation, and test in europe | 2007

An FPGA Implementation of Decision Tree Classification

Ramanathan Narayanan; Daniel Honbo; Gokhan Memik; Alok N. Choudhary; Joseph Zambreno

Data mining techniques are a rapidly emerging class of applications that have widespread use in several fields. One important problem in data mining is classification, which is the task of assigning objects to one of several predefined categories. Among the several solutions developed, decision tree classification (DTC) is a popular method that yields high accuracy while handling large datasets. However, DTC is a computationally intensive algorithm, and as data sizes increase, its running time can stretch to several hours. In this paper, we propose a hardware implementation of decision tree classification. We identify the compute-intensive kernel (Gini score computation) in the algorithm, and develop a highly efficient architecture, which is further optimized by reordering the computations and by using a bitmapped data structure. Our implementation on a Xilinx Virtex-II Pro FPGA platform (with 16 Gini units) provides up to 5.58times performance improvement over an equivalent software implementation

data mining in bioinformatics | 2011

A lung cancer outcome calculator using ensemble data mining on SEER data

Ankit Agrawal; Sanchit Misra; Ramanathan Narayanan; Lalith Polepeddi; Alok N. Choudhary

We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer using data mining techniques. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several data mining classification techniques were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. Further, we have developed an on-line lung cancer outcome calculator for estimating risk of mortality after 6 months, 9 months, 1 year, 2 year, and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcome-Calculator/

ieee international symposium on workload characterization | 2006

An Architectural Characterization Study of Data Mining and Bioinformatics Workloads

Berkin Özisikyilmaz; Ramanathan Narayanan; Joseph Zambreno; Gokhan Memik; Alok N. Choudhary

Data mining is the process of automatically finding implicit, previously unknown, and potentially useful information from large volumes of data. Advances in data extraction techniques have resulted in tremendous increase in the input data size of data mining applications. Data mining systems, on the other hand, have been unable to maintain the same rate of growth. Therefore, there is an increasing need to understand the bottlenecks associated with the execution of these applications in modern architectures. In this paper, we present MineBench, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories: classification, clustering, association rule mining and optimization. First, we highlight the uniqueness of data mining applications. Subsequently, we evaluate the MineBench applications on an 8-way shared memory (SMP) machine and analyze important performance characteristics such as L1 and L2 cache miss rates, branch misprediction rates

biological knowledge discovery and data mining | 2012

Lung cancer survival prediction using ensemble data mining on SEER data

Ankit Agrawal; Sanchit Misra; Ramanathan Narayanan; Lalith Polepeddi; Alok N. Choudhary

We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several supervised classification methods were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. We have developed an on-line lung cancer outcome calculator for estimating the risk of mortality after 6 months, 9 months, 1 year, 2 year and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. Further, ensemble voting models were also created for predicting conditional survival outcome for lung cancer estimating risk of mortality after 5 years of diagnosis, given that the patient has already survived for a period of time, and included in the calculator. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcomeCalculator/.

field-programmable technology | 2007

Design and Implementation of an FPGA Architecture for High-Speed Network Feature Extraction

Sailesh Pati; Ramanathan Narayanan; Gokhan Memik; Alok N. Choudhary; Joseph Zambreno

Network feature extraction involves the storage and classification of network packet activity. Although primarily employed in network intrusion detection systems, feature extraction is also used to determine various other aspects of a networks behavior such as total traffic and average connection size. Current software methods used for extraction of network features fail to meet the performance requirements of next-generation high-speed networks. In this paper, we propose an FPGA-based reconfigurable architecture for feature extraction of large high-speed networks. Our design makes use of parallel rows of hash functions and sketch tables in order to process network packets at a very high throughput. We present a detailed description of our architecture and its implementation on a Xilinx Virtex-II Pro FPGA board, and provide cycle-accurate timing results for feature extraction of input networking benchmark data. Our results demonstrate real-world throughputs of as high as 3.32 Gbps, with speedups reaching 18x when compared to an equivalent software implementation.

acm symposium on applied computing | 2010

FANGS: high speed sequence mapping for next generation sequencers

Sanchit Misra; Ramanathan Narayanan; Simon Lin; Alok N. Choudhary

Next Generation Sequencing machines are generating millions of short DNA sequences (reads) everyday. There is a need for efficient algorithms to map these sequences to the reference genome to identify SNPs or rare transcripts and to fulfill the dream of personalized medicine. We present a Fast Algorithm for Next Generation Sequencers (FANGS), which dynamically reduces the search space by using q-gram filtering and pigeon hole principle to rapidly map 454-Roche reads onto a reference genome. FANGS is a sequential algorithm designed to find all the matches of a query sequence in the reference genome tolerating a large number of mismatches or insertions/deletions. Using FANGS, we mapped 50000 reads with a total of 25 million nucleotides to the human genome in as little as 23.3 minutes on a typical desktop computer. Through our experiments, we found that FANGS is upto an order of magnitude faster than the state-of-the-art techniques for queries of length 500 allowing 5 mismatches or insertion/deletions.

international conference on computational advances in bio and medical sciences | 2011

Poster: A lung cancer mortality risk calculator based on SEER data

Ankit Agrawal; Sanchit Misra; Ramanathan Narayanan; Lalith Polepeddi; Alok N. Choudhary

We analyze the lung cancer data available from the SEER program for developing survival prediction models using data mining techniques. The prototype mortality risk calculator developed as a result of this study is available at info.eecs.northwestern.edu:8080/CancerMortalityRiskCalculator

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

pFANGS: Parallel high speed sequence mapping for Next Generation 454-roche Sequencing reads

Sanchit Misra; Ramanathan Narayanan; Wei-keng Liao; Alok N. Choudhary; Simon Lin

Millions of DNA sequences (reads) are generated by Next Generation Sequencing machines everyday. There is a need for high performance algorithms to map these sequences to the reference genome to identify single nucleotide polymorphisms or rare transcripts to fulfill the dream of personalized medicine. In this paper, we present a high-throughput parallel sequence mapping program pFANGS. pFANGS is designed to find all the matches of a query sequence in the reference genome tolerating a large number of mismatches or insertions/deletions. pFANGS partitions the computational workload and data among all the processes and employs load-balancing mechanisms to ensure better process efficiency. Our experiments show that, with 512 processors, we are able to map approximately 31 million 454/Roche queries of length 500 each to a reference human genome per hour allowing 5 mismatches or insertion/deletions at full sensitivity. We also report and compare the performance results of two alternative parallel implementations of pFANGS: a shared memory OpenMP implementation and a MPI-OpenMP hybrid implementation.

Explore More