Seung Jun Shin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seung Jun Shin is active.

Explore More

Publication

Featured researches published by Seung Jun Shin.

Biometrics | 2014

Probability-Enhanced Sufficient Dimension Reduction for Binary Classification

Seung Jun Shin; Yichao Wu; Hao Helen Zhang; Yufeng Liu

In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991) Journal of the American Statistical Association 86, 316-327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.

Journal of Computational and Graphical Statistics | 2014

Two-Dimensional Solution Surface for Weighted Support Vector Machines

Seung Jun Shin; Yichao Wu; Hao Helen Zhang

The support vector machine (SVM) is a popular learning method for binary classification. Standard SVMs treat all the data points equally, but in some practical problems it is more natural to assign different weights to observations from different classes. This leads to a broader class of learning, the so-called weighted SVMs (WSVMs), and one of their important applications is to estimate class probabilities besides learning the classification boundary. There are two parameters associated with the WSVM optimization problem: one is the regularization parameter and the other is the weight parameter. In this article, we first establish that the WSVM solutions are jointly piecewise-linear with respect to both the regularization and weight parameter. We then develop a state-of-the-art algorithm that can compute the entire trajectory of the WSVM solutions for every pair of the regularization parameter and the weight parameter at a feasible computational cost. The derived two-dimensional solution surface provides theoretical insight on the behavior of the WSVM solutions. Numerically, the algorithm can greatly facilitate the implementation of the WSVM and automate the selection process of the optimal regularization parameter. We illustrate the new algorithm on various examples. This article has online supplementary materials.

Computational Statistics & Data Analysis | 2017

Penalized principal logistic regression for sparse sufficient dimension reduction

Seung Jun Shin; Andreas Artemiou

Sufficient dimension reduction (SDR) is a successive tool for reducing the dimensionality of predictors by finding the central subspace, a minimal subspace of predictors that preserves all the regression information. When predictor dimension is large, it is often assumed that only a small number of predictors is informative. In this regard, sparse SDR is desired to achieve variable selection and dimension reduction simultaneously. We propose a principal logistic regression (PLR) as a new SDR tool and further develop its penalized version for sparse SDR. Asymptotic analysis shows that the penalized PLR enjoys the oracle property. Numerical investigation supports the advantageous performance of the proposed methods.

Biometrika | 2017

Principal weighted support vector machines for sufficient dimension reduction in binary classification

Seung Jun Shin; Yichao Wu; Hao Helen Zhang; Yufeng Liu

SUMMARY Sufficient dimension reduction is popular for reducing data dimensionality without stringent model assumptions. However, most existing methods may work poorly for binary classification. For example, sliced inverse regression (Li, 1991) can estimate at most one direction if the response is binary. In this paper we propose principal weighted support vector machines, a unified framework for linear and nonlinear sufficient dimension reduction in binary classification. Its asymptotic properties are studied, and an efficient computing algorithm is proposed. Numerical examples demonstrate its performance in binary classification.

Journal of the American Statistical Association | 2018

Bayesian Semiparametric Estimation of Cancer-Specific Age-at-Onset Penetrance With Application to Li-Fraumeni Syndrome

Seung Jun Shin; Ying Yuan; Louise C. Strong; Jasmina Bojadzieva; Wenyi Wang

ABSTRACT Penetrance, which plays a key role in genetic research, is defined as the proportion of individuals with the genetic variants (i.e., genotype) that cause a particular trait and who have clinical symptoms of the trait (i.e., phenotype). We propose a Bayesian semiparametric approach to estimate the cancer-specific age-at-onset penetrance in the presence of the competing risk of multiple cancers. We employ a Bayesian semiparametric competing risk model to model the duration until individuals in a high-risk group develop different cancers, and accommodate family data using family-wise likelihoods. We tackle the ascertainment bias arising when family data are collected through probands in a high-risk population in which disease cases are more likely to be observed. We apply the proposed method to a cohort of 186 families with Li-Fraumeni syndrome identified through probands with sarcoma treated at MD Anderson Cancer Center from 1944 to 1982. Supplementary materials for this article are available online.

Statistics & Probability Letters | 2017

The cumulative Kolmogorov filter for model-free screening in ultrahigh dimensional data

Arlene Kyoung Hee Kim; Seung Jun Shin

We propose a cumulative Kolmogorov filter to improve the fused Kolmogorov filter proposed by Mai and Zou (2015) via cumulative slicing. We establish an improved asymptotic result under relaxed assumptions and numerically demonstrate its enhanced finite sample performance.

Journal of statistical theory and practice | 2017

A comparative study of the dose-response analysis with application to the target dose estimation

Seung Jun Shin; Sujit K. Ghosh

With a quantal response, the dose-response relation is summarized by the response probability function (RPF) that provides probabilities of the response being reacted as a function of dose levels. In the dose-response analysis (DRA), it is often of primary interest to find a dose at which targeted response probability is attained, which we call target dose (TD). The estimation of the TD clearly depends on the underlying RPF structure. In this article, we provide a comparative analysis of some of the existing and newly proposed RPF estimation methods with particular emphasis on TD estimation. Empirical performances based on simulated data are presented to compare the existing and newly proposed methods. Nonparametric models based on a sequence of Bernstein polynomials are found to be robust against model misspecification. The methods are also illustrated using data obtained from a toxicological study.

Biometrical Journal | 2014