Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shinjae Yoo is active.

Publication


Featured researches published by Shinjae Yoo.


knowledge discovery and data mining | 2009

Mining social networks for personalized email prioritization

Shinjae Yoo; Yiming Yang; Frank Lin; Il-Chul Moon

Email is one of the most prevalent communication tools today, and solving the email overload problem is pressingly urgent. A good way to alleviate email overload is to automatically prioritize received messages according to the priorities of each user. However, research on statistical learning methods for fully personalized email prioritization (PEP) has been sparse due to privacy issues, since people are reluctant to share personal messages and importance judgments with the research community. It is therefore important to develop and evaluate PEP methods under the assumption that only limited training examples can be available, and that the system can only have the personal email data of each user during the training and testing of the model for that user. This paper presents the first study (to the best of our knowledge) under such an assumption. Specifically, we focus on analysis of personal social networks to capture user groups and to obtain rich features that represent the social roles from the viewpoint of a particular user. We also developed a novel semi-supervised (transductive) learning algorithm that propagates importance labels from training examples to test examples through message and user nodes in a personal email network. These methods together enable us to obtain an enriched vector representation of each new email message, which consists of both standard features of an email message (such as words in the title or body, sender and receiver IDs, etc.) and the induced social features from the sender and receivers of the message. Using the enriched vector representation as the input in SVM classifiers to predict the importance level for each test message, we obtained significant performance improvement over the baseline system (without induced social features) in our experiments on a multi-user data collection. We obtained significant performance improvement over the baseline system (without induced social features) in our experiments on a multi-user data collection: the relative error reduction in MAE was 31% in micro-averaging, and 14% in macro-averaging.


international acm sigir conference on research and development in information retrieval | 2005

Robustness of adaptive filtering methods in a cross-benchmark evaluation

Yiming Yang; Shinjae Yoo; Jian Zhang; Bryan Kisiel

This paper reports a cross-benchmark evaluation of regularized logistic regression (LR) and incremental Rocchio for adaptive filtering. Using four corpora from the Topic Detection and Tracking (TDT) forum and the Text Retrieval Conferences (TREC) we evaluated these methods with non-stationary topics at various granularity levels, and measured performance with different utility settings. We found that LR performs strongly and robustly in optimizing T11SU (a TREC utility function) while Rocchio is better for optimizing Ctrk (the TDT tracking cost), a high-recall oriented objective function. Using systematic cross-corpus parameter optimization with both methods, we obtained the best results ever reported on TDT5, TREC10 and TREC11. Relevance feedback on a small portion (0.05~0.2%) of the TDT5 test documents yielded significant performance improvements, measuring up to a 54% reduction in Ctrk and a 20.9% increase in T11SU (with b=0.1), compared to the results of the top-performing system in TDT2004 without relevance feedback information.


international conference on smart grid communications | 2013

Cloud motion estimation for short term solar irradiation prediction

Hao Huang; Jin Xu; Zhenzhou Peng; Shinjae Yoo; Dantong Yu; Dong Huang; Hong Qin

Variability of solar energy is the most significant issue for integrating solar energy into the power Grid. There are pressing demands to develop methods to accurately estimate cloud motion that directly affects the stability of solar power output.We propose a solar prediction system that can detect cloud movements from the TSI (total sky imager) images, and then estimate the future cloud position over solar panels and subsequent solar irradiance fluctuations incurred by cloud transients. The experiment studies show that our proposed approach significantly improves the quality of cloud motion estimation within a time window (up to a few minutes) that is sufficient for grid operators to take actions to mitigate the solar power volatility.


IEEE Intelligent Systems | 2010

Personalized Email Prioritization Based on Content and Social Network Analysis

Yiming Yang; Shinjae Yoo; Frank Lin; Il-Chul Moon

The proposed system combines unsupervised clustering, social network analysis, semisupervised feature induction, and supervised classification to model user priorities among incoming email messages.


Plant Journal | 2016

Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis.

Fei He; Shinjae Yoo; Daifeng Wang; Sunita Kumari; Mark Gerstein; Doreen Ware; Sergei Maslov

Transcriptome data sets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by a lack of metadata or differences in annotation styles of different labs. In this study, we carefully selected and integrated 6057 Arabidopsis microarray expression samples from 304 experiments deposited to the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI). Metadata such as tissue type, growth conditions and developmental stage were manually curated for each sample. We then studied the global expression landscape of the integrated data set and found that samples of the same tissue tend to be more similar to each other than to samples of other tissues, even in different growth conditions or developmental stages. Root has the most distinct transcriptome, compared with aerial tissues, but the transcriptome of cultured root is more similar to the transcriptome of aerial tissues, as the cultured root samples lost their cellular identity. Using a simple computational classification method, we showed that the tissue type of a sample can be successfully predicted based on its expression profile, opening the door for automatic metadata extraction and facilitating the re-use of plant transcriptome data. As a proof of principle, we applied our automated annotation pipeline to 708 RNA-seq samples from public repositories and verified the accuracy of our predictions with sample metadata provided by the authors.


conference on information and knowledge management | 2015

Unsupervised Feature Selection on Data Streams

Hao Huang; Shinjae Yoo; Shiva Prasad Kasiviswanathan

Massive data streams are continuously being generated from sources such as social media, broadcast news, etc., and typically these datapoints lie in high-dimensional spaces (such as the vocabulary space of a language). Timely and accurate feature subset selection in these massive data streams has important applications in model interpretation, computational/storage cost reduction, and generalization enhancement. In this paper, we introduce a novel unsupervised feature selection approach on data streams that selects important features by making only one pass over the data while utilizing limited storage. The proposed algorithm uses ideas from matrix sketching to efficiently maintain a low-rank approximation of the observed data and applies regularized regression on this approximation to identify the important features. We theoretically prove that our algorithm is close to an expensive offline approach based on global singular value decompositions. The experimental results on a variety of text and image datasets demonstrate the excellent ability of our approach to identify important features even in presence of concept drifts and also its efficiency over other popular scalable feature selection algorithms.


Proceedings of the Twelfth International Workshop on Multimedia Data Mining | 2012

Correlation and local feature based cloud motion estimation

Hao Huang; Shinjae Yoo; Dantong Yu; Dong Huang; Hong Qin

Short-term changes in atmospheric transmissivity caused by clouds can engender more severe fluctuations in photovoltaic (PV) outputs than those from traditional power plants. As PV energy continues to penetrate the U. S. National Energy Grid, such volatility increasingly lowers its reliability, efficiency, and value-added contribution. Therefore a model that can accurately predict the cloud motion and its affect on PV systems production is in a pressing demands. It can be used to mitigate the undesired behavior beforehand. In this paper we explore the use of Total Sky Images and the cloud estimation techniques based on such images. To further improve estimation quality of motion vector, we propose a novel hybrid algorithm taking the advantages of both correlation based and local feature based approaches. Our proposed hybrid approach significantly reduces the cloud motion prediction error rate by 25% on average, which can help to predict short term solar energy frustration in our later work.


international conference on data mining | 2011

A Robust Clustering Algorithm Based on Aggregated Heat Kernel Mapping

Hao Huang; Shinjae Yoo; Hong Qin; Dantong Yu

Current spectral clustering algorithms suffer from both sensitivity to scaling parameter selection in similarity matrix construction, and data perturbation. This paper aims to improve robustness in clustering algorithms and combat these two limitations based on heat kernel theory. Heat kernel can statistically depict traces of random walk, so it has an intrinsic connection with diffusion distance, with which we can ensure robustness during any clustering process. By integrating heat distributed along time scale, we propose a novel method called Aggregated Heat Kernel (AHK) to measure the distance between each point pair in their eigen space. Using AHK and Laplace-Beltrami Normalization (LBN) we are able to apply an advanced noise-resisting robust spectral mapping to original dataset. Moreover it offers stability on scaling parameter tuning. Experimental results show that, compared to other popular spectral clustering methods, our algorithm can achieve robust clustering results on both synthetic and UCI real datasets.


conference on information and knowledge management | 2012

Local anomaly descriptor: a robust unsupervised algorithm for anomaly detection based on diffusion space

Hao Huang; Hong Qin; Shinjae Yoo; Dantong Yu

Current popular anomaly detection algorithms are capable of detecting global anomalies but oftentimes fail to distinguish local anomalies from normal instances. This paper aims to improve unsupervised anomaly detection via the exploration of physics-based diffusion space. Building upon the embedding manifold derived from diffusion maps, we devise Local Anomaly Descriptor (LAD) whose originality results from faithfully preserving intrinsic and informative density-relevant neighborhood information. This robust and effective algorithm is designed with a weighted umbrella Laplacian operator to bridge global and local properties. To further enhance the efficacy of our proposed algorithm, we explore the utility of anisotropic Gaussian kernel (AGK) which can offer better manifold-aware affinity information. Comprehensive experiments on both synthetic and UCI real datasets verify that our LAD outperforms existing anomaly detection algorithms.


international conference on smart grid communications | 2013

Solar irradiance forecast system based on geostationary satellite

Zhenzhou Peng; Shinjae Yoo; Dantong Yu; Dong Huang

Solar irradiance variability, left unmitigated, will threat the stability of grid system, and might incur significant economical impacts. This paper focuses on a pipeline to predict solar irradiance from 30 minutes to 5 hours using geostationary satellite. It consists of two parts: cloud motion estimation and solar irradiance prediction using the estimated satellite images. The main challenge is image noise at all levels of processing from motion estimation to irradiance prediction. To overcome this problem, we propose to use optical flow motion estimation, and subsequently combine multiple evidences together using robust support vector regression (SVR). Our systematic evaluation shows significant improvements over the baseline in both motion estimation and irradiance prediction.

Collaboration


Dive into the Shinjae Yoo's collaboration.

Top Co-Authors

Avatar

Dantong Yu

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Hao Huang

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

Hong Qin

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

Dong Huang

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jin Xu

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

John Heiser

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Paul Kalb

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dimitrios Katramatos

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Yiming Yang

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge