Jongwoo Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jongwoo Song is active.

Explore More

Publication

Featured researches published by Jongwoo Song.

Toxicology and Applied Pharmacology | 2003

Microarray analysis of changes in bone cell gene expression early after cadmium gavage in mice.

Akhila Regunathan; David Glesne; Allison K. Wilson; Jongwoo Song; Dan L. Nicolae; Tony Flores; Maryka H. Bhattacharyya

We developed an in vivo model for cadmium-induced bone loss in which mice excrete bone mineral in feces beginning 8 h after cadmium gavage. Female mice of three strains [CF1, MTN (metallothionein-wild-type), and MT1,2KO (MT1,2-deficient)] were placed on a low-calcium diet for 2 weeks. Each mouse was gavaged with 200 microg Cd or vehicle only. Fecal calcium was monitored daily for 9 days, beginning 4 days before cadmium gavage, to document the bone response. For CF1 mice, bones were taken from four groups: +/- Cd, 2 h after Cd and +/- Cd, 4 h after Cd. MTN and MT1,2KO strains had two groups each: +/-Cd, 4 h after Cd. PolyA+ RNA preparations from marrow-free shafts of femura and tibiae of each +/- Cd pair were submitted to Incyte Genomics for microarray analysis. Fecal Ca results showed that bone calcium excreted after cadmium differed for the three mouse strains: CF1, 0.24 +/- 0.08 mg; MTN, 0.92 +/- 0.22 mg; and MT1,2KO, 1.7 +/- 0.4 mg. Gene array results showed that nearly all arrayed genes were unaffected by cadmium. However, MT1 and MT2 had Cd+/Cd- expression ratios >1 in all four groups, while all ratios for MT3 were essentially 1, showing specificity. Both probes for MAPK 14 (p38 MAPK) had expression ratios >1, while no other MAPK responded to cadmium. Vacuolar proton pump ATPase and integrin alpha v (osteoclast genes), transferrin receptor, and src-like adaptor protein genes were stimulated by Cd; other src-related genes were unaffected. Genes for bone formation, stress response, growth factors, and signaling molecules showed little or no response to cadmium. Results support the hypothesis that Cd stimulates bone demineralization via a p38 MAPK pathway involving osteoclast activation.

Computational Statistics & Data Analysis | 2012

A quantile estimation for massive data with generalized Pareto distribution

Jongwoo Song; Seongjoo Song

This paper proposes a new method of estimating extreme quantiles of heavy-tailed distributions for massive data. The method utilizes the Peak Over Threshold (POT) method with generalized Pareto distribution (GPD) that is commonly used to estimate extreme quantiles and the parameter estimation of GPD using the empirical distribution function (EDF) and nonlinear least squares (NLS). We first estimate the parameters of GPD using EDF and NLS and then, estimate multiple high quantiles for massive data based on observations over a certain threshold value using the conventional POT. The simulation results demonstrate that our parameter estimation method has a smaller Mean square error (MSE) than other common methods when the shape parameter of GPD is at least 0. The estimated quantiles also show the best performance in terms of root MSE (RMSE) and absolute relative bias (ARB) for heavy-tailed distributions.

Computational Statistics & Data Analysis | 2010

Estimating the mixing proportion in a semiparametric mixture model

Seongjoo Song; Dan L. Nicolae; Jongwoo Song

In this paper, we investigate methods of estimating the mixing proportion in the case when one of the probability densities is not specified analytically in a mixture model. The methodology we propose is motivated by a sequential clustering algorithm. After a sequential clustering algorithm finds the center of a cluster, the next step is to identify observations belonging to that cluster. If we assume that the center of the cluster is known and that the distribution of observations not belonging to the cluster is unknown, the problem of identifying observations in the cluster is similar to the problem of estimating the mixing proportion in a special two-component mixture model. The mixing proportion can be considered as the proportion of observations belonging to the cluster. We propose two estimators for parameters in the model and compare the performance of these two estimators in several different cases.

Korean Journal of Applied Statistics | 2013

Value at Risk with Peaks over Threshold: Comparison Study of Parameter Estimation

Minjung Kang; Jiyeon Kim; Jongwoo Song; Seongjoo Song

The importance of financial risk management has been highlighted after several recent incidences of global financial crisis. One of the issues in financial risk management is how to measure the risk; currently, the most widely used risk measure is the Value at Risk(VaR). We can consider to estimate VaR using extreme value theory if the financial data have heavy tails as the recent market trend. In this paper, we study estimations of VaR using Peaks over Threshold(POT), which is a common method of modeling fat-tailed data using extreme value theory. To use POT, we first estimate parameters of the Generalized Pareto Distribution(GPD). Here, we compare three different methods of estimating parameters of GPD by comparing the performance of the estimated VaR based on KOSPI 5 minute-data. In addition, we simulate data from normal inverse Gaussian distributions and examine two parameter estimation methods of GPD. We find that the recent methods of parameter estimation of GPD work better than the maximum likelihood estimation when the kurtosis of the return distribution of KOSPI is very high and the simulation experiment shows similar results.

Korean Journal of Applied Statistics | 2008

Nonlinear Regression for an Asymptotic Option Price

Seongjoo Song; Jongwoo Song

This paper approaches the problem of option pricing in an incomplete market, where the underlying asset price process follows a compound Poisson model. We assume that the price process follows a compound Poisson model under an equivalent martingale measure and it converges weakly to the Black-Scholes model. First, we express the option price as the expectation of the discounted payoff and expand it at the Black-Scholes price to obtain a pricing formula with three unknown parameters. Then we estimate those parameters using the market option data. This method can use the option data on the same stock with different expiration dates and different strike prices.

Expert Systems With Applications | 2018

Feature selection for continuous aggregate response and its application to auto insurance data

Suyeon Kang; Jongwoo Song

Abstract This paper presents new feature selection algorithms for aggregate data analysis. Data aggregation is commonly used when it is not appropriate to model the relationship between a response and explanatory variables at an individual-level. We investigate substantial challenges in analysis for aggregate data. Then, we propose a groupwise feature selection method that addresses (i) the change in dataset depending on the selection of predictor variables, (ii) the presence of potential missing responses, and (iii) the suitability of model selection criteria when comparing models using different datasets. In application to real auto insurance data, we find a set of important predictors to classify the policyholders into some homogeneous risk groups. Our results clearly demonstrate the potential of the proposed feature selection method for aggregate data analysis in terms of flexibility and computational complexity. We expect that the proposed algorithms would be further applied into a wide range of decision-making tasks using aggregate data as they are applicable to any type of data.

BMC Bioinformatics | 2017

Robust gene selection methods using weighting schemes for microarray data analysis

Suyeon Kang; Jongwoo Song

BackgroundA common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states. Owing to the high-dimensional nature of microarray data, identification of significant genes has been essential in analyzing the data. However, the performances of many gene selection techniques are highly dependent on the experimental conditions, such as the presence of measurement error or a limited number of sample replicates.ResultsWe have proposed new filter-based gene selection techniques, by applying a simple modification to significance analysis of microarrays (SAM). To prove the effectiveness of the proposed method, we considered a series of synthetic datasets with different noise levels and sample sizes along with two real datasets. The following findings were made. First, our proposed methods outperform conventional methods for all simulation set-ups. In particular, our methods are much better when the given data are noisy and sample size is small. They showed relatively robust performance regardless of noise level and sample size, whereas the performance of SAM became significantly worse as the noise level became high or sample size decreased. When sufficient sample replicates were available, SAM and our methods showed similar performance. Finally, our proposed methods are competitive with traditional methods in classification tasks for microarrays.ConclusionsThe results of simulation study and real data analysis have demonstrated that our proposed methods are effective for detecting significant genes and classification tasks, especially when the given data are noisy or have few sample replicates. By employing weighting schemes, we can obtain robust and reliable results for microarray data analysis.

Korean Journal of Applied Statistics | 2015

Classification Analysis for Unbalanced Data

Dongah Kim; Suyeon Kang; Jongwoo Song

We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

Korean Journal of Applied Statistics | 2015

A Study on Domestic Drama Rating Prediction

Suyeon Kang; Heejeong Jeon; Jihye Kim; Jongwoo Song

Abstract Audience rating competition in the domestic drama market has increased recently due to the introductionof commercial broadcasting and diversiﬁcation of channels. There is now a need for thorough studies andanalysis on audience rating. Especially, a drama rating is an important measure to estimate advertisementcosts for producers and advertisers. In this paper, we study the drama rating prediction models using variousdata mining techniques such as linear regression, LASSO regression, random forest, and gradient boosting.The analysis results show that initial drama ratings are aﬀected by structural elements such as broadcastingstation and broadcasting time. Average drama ratings are also inﬂuenced by earlier public opinion such asthe number of internet searches about the drama.Keywords: drama rating, linear regression, LASSO regression, random forest, gradient boosting, importantvariables 1. 서론 최근 드라마 시장에는 다양한 주제를 다루고 여러 유명 배우들을캐스팅한 드라마들이등장하고 있다.2011년 말 방송법이개정됨에 따라 종합편성채널이개국한 이후 지상파 방송사 중심이었던 드라마 시장이케이블 및 종합편성채널까지 확대되었으며 스마트폰, 태블릿 등드라마를 시청할 수있는 방법이다양화되었다. 이렇게 방송시장이빠르게 변화하고 있음에도 불구하고 시청률은TV를 기반으로 하는 전통적인방법으로 측정되어 프로그램 제작자와 광고주들에게 광고비 규모를 산정하는 데 매우 중요한 척도로 활용되고 있다. 특히 드라마는 대중적 인기나 그에 따른 사회적 영향력이라는 차원에서다른 어떤장르의프로그램보다도 중요한 의미를 지니기 때문에 (Bae, 2005) 드라마 시청률을예측하는 것은제작자와 광고주 입장에서매우 중요하다. 드라마 시청률에 관한 연구는 이미 오래전부터 진행되어왔지만대부분의연구들이일반회귀모형에 기반하고 있으며 분석대상이특정 방송사 또는 지상파 방송사의드라마에 한정되어 있다. 본 연구에서는 이런 한계점을극복하고자최근 방송시장의변화를 고려한 다양한 통계적 예측 모형을제시하고자한다.본 연구의목적은지상파 방송사와 케이블, 종합편성채널을모두포함한 드라마를 대상으로 하여 다양한 데이터마이닝 기법을활용한 시청률 예측 모형을제시하고 시청률 예측에 중요한 영향을미치는 요

Communications for Statistical Applications and Methods | 2010

Option Pricing with Bounded Expected Loss under Variance-Gamma Processes

Seongjoo Song; Jongwoo Song

Exponential Levy models have become popular in modeling price processes recently in mathematical finance. Although it is a relatively simple extension of the geometric Brownian motion, it makes the market incomplete so that the option price is not uniquely determined. As a trial to find an appropriate price for an option, we suppose a situation where a hedger wants to initially invest as little as possible, but wants to have the expected squared loss at the end not exceeding a certain constant. For this, we assume that the underlying price process follows a variance-gamma model and it converges to a geometric Brownian motion as its quadratic variation converges to a constant. In the limit, we use the mean-variance approach to find the asymptotic minimum investment with the expected squared loss bounded. Some numerical results are also provided.

Explore More