Daniel Honbo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Honbo is active.

Explore More

Publication

Featured researches published by Daniel Honbo.

international conference on data mining | 2011

SES: Sentiment Elicitation System for Social Media Data

Kunpeng Zhang; Yu Cheng; Yusheng Xie; Daniel Honbo; Ankit Agrawal; Diana Palsetia; Kathy Lee; Wei-keng Liao; Alok N. Choudhary

Social Media is becoming major and popular technological platform that allows users discussing and sharing information. Information is generated and managed through either computer or mobile devices by one person and consumed by many other persons. Most of these user generated content are textual information, as Social Networks(Face book, Linked In), Microblogging(Twitter), blogs(Blogspot, Word press). Looking for valuable nuggets of knowledge, such as capturing and summarizing sentiments from these huge amount of data could help users make informed decisions. In this paper, we develop a sentiment identification system called SES which implements three different sentiment identification algorithms. We augment basic compositional semantic rules in the first algorithm. In the second algorithm, we think sentiment should not be simply classified as positive, negative, and objective but a continuous score to reflect sentiment degree. All word scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, we propose a third algorithm which takes emoticons, negation word position, and domain-specific words into account. Furthermore, a machine learning model is employed on features derived from outputs of three algorithms. We conduct our experiments on user comments from Face book and tweets from twitter. The results show that utilizing Random Forest will acquire a better accuracy than decision tree, neural network, and logistic regression. We also propose a flexible way to represent document sentiment based on sentiments of each sentence contained. SES is available online.

design, automation, and test in europe | 2007

An FPGA Implementation of Decision Tree Classification

Ramanathan Narayanan; Daniel Honbo; Gokhan Memik; Alok N. Choudhary; Joseph Zambreno

Data mining techniques are a rapidly emerging class of applications that have widespread use in several fields. One important problem in data mining is classification, which is the task of assigning objects to one of several predefined categories. Among the several solutions developed, decision tree classification (DTC) is a popular method that yields high accuracy while handling large datasets. However, DTC is a computationally intensive algorithm, and as data sizes increase, its running time can stretch to several hours. In this paper, we propose a hardware implementation of decision tree classification. We identify the compute-intensive kernel (Gini score computation) in the algorithm, and develop a highly efficient architecture, which is further optimized by reordering the computations and by using a bitmapped data structure. Our implementation on a Xilinx Virtex-II Pro FPGA platform (with 16 Gini units) provides up to 5.58times performance improvement over an equivalent software implementation

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2011

Accelerating data mining workloads: current approaches and future challenges in system architecture design

Alok N. Choudhary; Daniel Honbo; Prabhat Kumar; Berkin Özisikyilmaz; Sanchit Misra; Gokhan Memik

Conventional systems based on general‐purpose processors cannot keep pace with the exponential increase in the generation and collection of data. It is therefore important to explore alternative architectures that can provide the computational capabilities required to analyze ever‐growing datasets. Programmable graphics processing units (GPUs) offer computational capabilities that surpass even high‐end multi‐core central processing units (CPUs), making them well‐suited for floating‐point‐ or integer‐intensive and data parallel operations. Field‐programmable gate arrays (FPGAs), which can be reconfigured to implement an arbitrary circuit, provide the capability to specify a customized datapath for any task. The multiple granularities of parallelism offered by FPGA architectures, as well as their high internal bandwidth, make them suitable for low complexity parallel computations. GPUs and FPGAs can serve as coprocessors for data mining applications, allowing the CPU to offload computationally intensive tasks for faster processing. Experiments have shown that heterogeneous architectures employing GPUs or FPGAs can result in significant application speedups over homogenous CPU‐based systems, while increasing performance per watt.

IEEE Intelligent Systems | 2014

MuSES: Multilingual Sentiment Elicitation System for Social Media Data

Yusheng Xie; Zhengzhang Chen; Kunpeng Zhang; Yu Cheng; Daniel Honbo; Ankit Agrawal; Alok N. Choudhary

A multilingual sentiment identification system (MuSES) implements three different sentiment identification algorithms. The first algorithm augments previous compositional semantic rules by adding rules specific to social media. The second algorithm defines a scoring function that measures the degree of a sentiment, instead of simply classifying a sentiment into binary polarities. All such scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, a third algorithm takes emoticons, negation word position, and domain-specific words into account. In addition, a proposed label-free process transfers multilingual sentiment knowledge between different languages. The authors conduct their experiments on user comments from Facebook, tweets from Twitter, and multilingual product reviews from Amazon.

Proceedings of the IEEE | 2006

High-Performance Software Protection Using Reconfigurable Architectures

Joseph Zambreno; Daniel Honbo; Alok N. Choudhary; Rahul Simha; Bhagirath Narahari

One of the key problems facing the computer industry today is ensuring the integrity of end-user applications and data. Researchers in the relatively new field of software protection investigate the development and evaluation of controls that prevent the unauthorized modification or use of system software. While many previously developed protection schemes have provided a strong level of security, their overall effectiveness has been hindered by a lack of transparency to the user in terms of performance overhead. Other approaches take to the opposite extreme and sacrifice security for the sake of this transparency. In this work we present an architecture for software protection that provides for a high level of both security and user transparency by utilizing field programmable gate array (FPGA) technology as the main protection mechanism. We demonstrate that by relying on FPGA technology, this approach can accelerate the execution of programs in a cryptographic environment, while maintaining the flexibility through reprogramming to carry out any compiler-driven protections that may be application-specific.

international conference on computational advances in bio and medical sciences | 2011

Efficient pairwise statistical significance estimation for local sequence alignment using GPU

Yuhong Zhang; Sanchit Misra; Daniel Honbo; Ankit Agrawal; Wei-keng Liao; Alok N. Choudhary

Pairwise statistical significance has been found to be quite accurate in identifying related sequences (homologs), which is a key step in numerous bioinformatics applications. However, it is computational and data intensive, particularly for a large amount of sequence data. To prevent it from becoming a performance bottleneck, we resort to Graphics Processing Units (GPUs) for accelerating the computation. In this paper, we present a GPU memory-access optimized implementation for a pairwise statistical significance estimation algorithm. By exploring the algorithms data access characteristics, we developed a tile-based scheme that can produce a contiguous memory accesses pattern to GPU global memory and sustain a large number of threads to achieve a high GPU occupancy. Our experimental results present both single- and multi-pair statistical significance estimations. The performance evaluation was carried out on an NVIDIA Telsa C2050 GPU. We observe more than 180× end-to-end speedup over the CPU implementation on an Intel© Core™ i7 processor. The proposed memory access optimizations and efficient framework are also applicable to many other sequence comparison based applications, such as DNA sequence mapping and database search.

high performance distributed computing | 2010

MPIPairwiseStatSig: parallel pairwise statistical significance estimation of local sequence alignment

Ankit Agrawal; Sanchit Misra; Daniel Honbo; Alok N. Choudhary

Sequence comparison is considered as a cornerstone application in bioinformatics, which forms the basis of many other applications. In particular, pairwise sequence alignment is a fundamental step in numerous sequence comparison based applications, where the typical purpose of pairwise sequence alignment step is homology detection, i.e., identifying related sequences. Estimation of statistical significance of a pairwise sequence alignment is crucial in homology detection. A recent development in the field is the use of pairwise statistical significance as an alternative to database statistical significance. Although pairwise statistical significance has been shown to be potentially superior than database statistical significance for homology detection (evaluated in terms of retrieval accuracy), currently it is much time consuming since it involves generating an empirical score distribution by aligning one sequence of the sequence-pair with N random shuffles of the other sequence. In this paper, we present a parallel algorithm for pairwise statistical significance estimation, called MPIPairwiseStatSig, implemented in C using MPI. Distributing the most compute-intensive portions of the pairwise statistical significance estimation procedure across multiple processors has been shown to result in near-linear speed-ups for the application.

Concurrency and Computation: Practice and Experience | 2011

Parallel pairwise statistical significance estimation of local sequence alignment using Message Passing Interface library

Ankit Agrawal; Sanchit Misra; Daniel Honbo; Alok N. Choudhary

Homology detection is a fundamental step in sequence analysis. In the recent years, pairwise statistical significance has emerged as a promising alternative to database statistical significance for homology detection. Although more accurate, currently it is much time consuming because it involves generating tens of hundreds of alignment scores to construct the empirical score distribution. This paper presents a parallel algorithm for pairwise statistical significance estimation, called MPIPairwiseStatSig, implemented in C using MPI library. We further apply the parallelization technique to estimate non‐conservative pairwise statistical significance using standard, sequence‐specific, and position‐specific substitution matrices, which has earlier demonstrated superior sequence comparison accuracy than original pairwise statistical significance. Distributing the most compute‐intensive portions of the pairwise statistical significance estimation procedure across multiple processors has been shown to result in near‐linear speed‐ups for the application. The MPIPairwiseStatSig program for pairwise statistical significance estimation is available for free academic use at www.cs.iastate.edu~ankitag/MPIPairwiseStatSig.html. Copyright

international acm sigir conference on research and development in information retrieval | 2012

Sentiment identification by incorporating syntax, semantics and context information

Kunpeng Zhang; Yusheng Xie; Yu Cheng; Daniel Honbo; Doug Downey; Ankit Agrawal; Wei-keng Liao; Alok N. Choudhary

This paper proposes a method based on conditional random fields to incorporate sentence structure (syntax and semantics) and context information to identify sentiments of sentences within a document. It also proposes and evaluates two different active learning strategies for labeling sentiment data. The experiments with the proposed approach demonstrate a 5-15% improvement in accuracy on Amazon customer reviews compared to existing supervised learning and rule-based methods.

conference on information and knowledge management | 2012

Probabilistic macro behavioral targeting

Yusheng Xie; Yu Cheng; Daniel Honbo; Kunpeng Zhang; Ankit Agrawal; Alok N. Choudhary; Yi Gao; Jiangtao Gou

We investigate a class of emerging online marketing challenges in social networks; and formally, we define macro behavioral targeting (MBT) to be the marketing efforts that appeal to a massive targeted population with non-personalized broadcasting. Upon the problem formulation, we describe a probabilistic graphical model for MBT. In our model, we derive the prior distributions from scratch because existing applications of graphical model / Bayesian network cannot fully capture the unique characteristics of MBT. In the derivation, we propose an approximation method to circumvent an intractable situation where order statistics need be calculated from exponentially increasing computations. In the experiments, we present case studies on real Facebook data.

Explore More