Is this you? Create Your Porfile

Noseong Park

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Noseong Park is active.

Explore More

Publication

Featured researches published by Noseong Park.

IEEE Transactions on Computational Social Systems | 2015

APE: A Data-Driven, Behavioral Model-Based Anti-Poaching Engine

Noseong Park; Edoardo Serra; Tom Snitch; V. S. Subrahmanian

We consider the problem of protecting a set of animals such as rhinos and elephants in a game park using D drones and R ranger patrols (on the ground) with R ≥ D. Using two years of data about animal movements in a game park, we propose the probabilistic spatio-temporal graph (pSTG) model of animal movement behaviors and show how we can learn it from the movement data. Using 17 months of data about poacher behavior, we also learn the probability that a region in the game park will be targeted by poachers. We formalize the anti-poaching problem as that of finding a coordinated route for the drones and ranger patrols that maximize the expected number of animals that are protected, given these two models as input and show that it is NP-complete. Because of this, we fine tune classical local search and genetic algorithms to the case of anti-poaching by taking specific advantage of the nature of the anti-poaching problem and its objective function. We develop a measure of the quality of an algorithm to route the drones and ranger patrols called “improvement ratio.” We develop a dynamic programming based APE_Coord_Route algorithm and show that it performs very well in practice, achieving an improvement ratio over 90%.

web search and data mining | 2016

Ensemble Models for Data-driven Prediction of Malware Infections

Chanhyun Kang; Noseong Park; B. Aditya Prakash; Edoardo Serra; V. S. Subrahmanian

Given a history of detected malware attacks, can we predict the number of malware infections in a country? Can we do this for different malware and countries? This is an important question which has numerous implications for cyber security, right from designing better anti-virus software, to designing and implementing targeted patches to more accurately measuring the economic impact of breaches. This problem is compounded by the fact that, as externals, we can only detect a fraction of actual malware infections. In this paper we address this problem using data from Symantec covering more than 1.4 million hosts and 50 malware spread across 2 years and multiple countries. We first carefully design domain-based features from both malware and machine-hosts perspectives. Secondly, inspired by epidemiological and information diffusion models, we design a novel temporal non-linear model for malware spread and detection. Finally we present ESM, an ensemble-based approach which combines both these methods to construct a more accurate algorithm. Using extensive experiments spanning multiple malware and countries, we show that ESM can effectively predict malware infection ratios over time (both the actual number and trend) upto 4 times better compared to several baselines on various metrics. Furthermore, ESMs performance is stable and robust even when the number of detected infections is low.

european conference on information retrieval | 2017

We Used Neural Networks to Detect Clickbaits: You Won’t Believe What Happened Next!

Ankesh Anand; Tanmoy Chakraborty; Noseong Park

Online content publishers often use catchy headlines for their articles in order to attract users to their websites. These headlines, popularly known as clickbaits, exploit a user’s curiosity gap and lure them to click on links that often disappoint them. Existing methods for automatically detecting clickbaits rely on heavy feature engineering and domain knowledge. Here, we introduce a neural network architecture based on Recurrent Neural Networks for detecting clickbaits. Our model relies on distributed word representations learned from a large unannotated corpora, and character embeddings learned via Convolutional Neural Networks. Experimental results on a dataset of news headlines show that our model outperforms existing techniques for clickbait detection with an accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99.

IEEE Intelligent Systems | 2015

Saving Rhinos with Predictive Analytics

Noseong Park; Edoardo Subrahmanian; V.S. Serra

This article, the first entry in the new Predictive Analytics column, looks at the problem of animal poaching. The authors describe their Anti-Poaching Engine system, which builds on behavior models of both rhinos and poachers to protect as many animals as possible.

international semantic web conference | 2013

Personalized Best Answer Computation in Graph Databases

Michael Ovelgönne; Noseong Park; V. S. Subrahmanian; Elizabeth K. Bowman; Kirk Ogaard

Though subgraph matching has been extensively studied as a query paradigm in semantic web and social network data environments, a user can get a large number of answers in response to a query. Just like Google does, these answers can be shown to the user in accordance with an importance ranking. In this paper, we present scalable algorithms to find the top-K answers to a practically important subset of SPARQL-queries, denoted as importance queries, via a suite of pruning techniques. We test our algorithms on multiple real-world graph data sets, showing that our algorithms are efficient even on networks with up to 6M vertices and 15M edges and far more efficient than popular triple stores.

knowledge discovery and data mining | 2016

MAP: Frequency-Based Maximization of Airline Profits based on an Ensemble Forecasting Approach

Bo An; Haipeng Chen; Noseong Park; V. S. Subrahmanian

Though there are numerous traditional models to predict market share and demand along airline routes, the prediction of existing models is not precise enough and, to the best of our knowledge, there is no use of data-mining based forecasting techniques to improve airline profitability. We propose the MAP (Maximizing Airline Profits) architecture designed to help airlines and make two key contributions in airline market share and route demand prediction and prediction-based airline profit optimization. Compared with past methods to forecast market share and demand along airline routes, we introduce a novel Ensemble Forecasting (MAP-EF) approach considering two new classes of features: (i) features derived from clusters of similar routes, and (ii) features based on equilibrium pricing. We show that MAP-EF achieves much better Pearson Correlation Coefficients (over 0.95 vs. 0.82 for market share, 0.98 vs. 0.77 for demand) and R2-values compared with three state-of-the-art works for forecasting market share and demand, while showing much lower variance. Using the results of MAP-EF, we develop MAP-Bilevel Branch and Bound (MAP-BBB) and MAP-Greedy (MAP-G) algorithms to optimally allocate flight frequencies over multiple routes, to maximize an airlines profit. Experimental results show that airlines can increase profits by a significant margin. All experiments were conducted with data aggregated from four sources: US Bureau of Transportation Statistics (BTS), US Bureau of Economic Analysis (BEA), the National Transportation Safety Board (NTSB), and the US Census Bureau (CB).

IEEE Transactions on Information Forensics and Security | 2017

A Probabilistic Logic of Cyber Deception

Sushil Jajodia; Noseong Park; Fabio Pierazzi; Andrea Pugliese; Edoardo Serra; Gerardo I. Simari; V. S. Subrahmanian

Malicious attackers often scan nodes in a network in order to identify vulnerabilities that they may exploit as they traverse the network. In this paper, we propose that the system generates a mix of true and false answers in response to scan requests. If the attacker believes that all scan results are true, then he will be on a wrong path. If he believes some scan results are faked, he would have to expend time and effort in order to separate fact from fiction. We propose a probabilistic logic of deception and show that various computations are NP-hard. We model the attacker’s state and show the effects of faked scan results. We then show how the defender can generate fake scan results in different states that minimize the damage the attacker can produce. We develop a Naive-PLD algorithm and a Fast-PLD heuristic algorithm for the defender to use and show experimentally that the latter performs well in a fraction of the run time of the former. We ran detailed experiments to assess the performance of these algorithms and further show that by running Fast-PLD off-line and storing the results, we can very efficiently answer run-time scan requests.

advances in social networks analysis and mining | 2016

Ensemble-based algorithms to detect disjoint and overlapping communities in networks

Tanmoy Chakraborty; Noseong Park; V. S. Subrahmanian

Given a set AL of community detection algorithms and a graph G as inputs, we propose two ensemble methods EnDisCo and MeDOC that (respectively) identify disjoint and overlapping communities in G. EnDisCo transforms a graph into a latent feature space by leveraging multiple base solutions and discovers disjoint community structure. MeDOC groups similar base communities into a meta-community and detects both disjoint and overlapping community structures. Experiments are conducted at different scales on both synthetically generated networks as well as on several real-world networks for which the underlying ground-truth community structure is available. Our extensive experiments show that both algorithms outperform state-of-the-art non-ensemble algorithms by a significant margin. Moreover, we compare EnDisCo and MeDOC with a recent ensemble method for disjoint community detection and show that our approaches achieve superior performance. To the best of our knowledge, MeDOC is the first ensemble approach for overlapping community detection.

ACM Transactions on Internet Technology | 2018

SHARE: A Stackelberg Honey-Based Adversarial Reasoning Engine

Sushil Jajodia; Noseong Park; Edoardo Serra; V. S. Subrahmanian

A “noisy-rich” (NR) cyber-attacker (Lippmann et al. 2012) is one who tries all available vulnerabilities until he or she successfully compromises the targeted network. We develop an adversarial foundation, based on Stackelberg games, for how NR-attackers will explore an enterprise network and how they will attack it, based on the concept of a system vulnerability dependency graph. We develop a mechanism by which the network can be modified by the defender to induce deception by placing honey nodes and apparent vulnerabilities into the network to minimize the expected impact of the NR-attacker’s attacks (according to multiple measures of impact). We also consider the case where the adversary learns from blocked attacks using reinforcement learning. We run detailed experiments with real network data (but with simulated attack data) and show that Stackelberg Honey-based Adversarial Reasoning Engine performs very well, even when the adversary deviates from the initial assumptions made about his or her behavior. We also develop a method for the attacker to use reinforcement learning when his or her activities are stopped by the defender. We propose two stopping policies for the defender: Stop Upon Detection allows the attacker to learn about the defender’s strategy and (according to our experiments) leads to significant damage in the long run, whereas Stop After Delay allows the defender to introduce greater uncertainty into the attacker, leading to better defendability in the long run.

acm conference on hypertext | 2017

SENA: Preserving Social Structure for Network Embedding

Sanghyun Hong; Tanmoy Chakraborty; Sungjin Ahn; Ghaith Husari; Noseong Park

Network embedding transforms a network into a continuous feature space. Network augmentation, on the other hand, leverages this feature representation to obtain a more informative network by adding potentially plausible edges while removing noisy edges. Traditional network embedding methods are often inefficient in capturing - (i) the latent relationship when the network is sparse (the network sparsity problem), and (ii) the local and global neighborhood structure of vertices (structure preserving problem). We propose SENA, a structural embedding and network augmentation framework for social network analysis. Unlike other embedding methods which only generate vertex features, SENA generates features for both vertices and relations (edges) by minimizing a well-designed objective function composed of a loss function and a regularization. The loss function reduces the network-sparsity problem by learning from both the edges present (true edges) and absent (false edges) in the network; whereas the regularization term preserves the structural properties of the network by efficiently considering - (i) the local neighborhood of vertices and edges, and (ii) the network spectra, i.e., eigenvectors of a symmetric matrix representing the network. We compare SENA with four baseline network embedding methods, namely DeepWalk, SE, SME and TransE. We demonstrate the efficacy of SENA through a task-based evaluation setting on different real-world networks. We consider the state-of-the-art algorithms for (i) community detection, (ii) link prediction and (iii) knowledge graph query answering, and show that with SENAs representation, these algorithms achieve up to 10%, 9% and (surprisingly) 108% higher accuracy respectively compared to the best baseline embedding methods.

Explore More