Hung-Hsuan Chen
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hung-Hsuan Chen.
acm symposium on applied computing | 2012
Hung-Hsuan Chen; Liang Gou; Xiaolong Zhang; C. Lee Giles
Vertex similarity measure is a useful tool to discover the hidden relationships of vertices in a complex network. We introduce relation strength similarity (RSS), a vertex similarity measure that could better capture potential relationships of real world network structure. RSS is unique in that is is an asymmetric measure which could be used for a more general purpose social network analysis; allows users to explicitly specify the relation strength between neighboring vertices for initialization; and offers a discovery range parameter could be adjusted by users for extended network degree search. To show the potential of vertex similarity measures and the superiority of RSS over other measures, we conduct experiments on two real networks, a biological network and a coauthorship network. Experimental results show that RSS is better in discovering the hidden relationships of the networks.
Ai Magazine | 2015
Jian Wu; Kyle Williams; Hung-Hsuan Chen; Madian Khabsa; Cornelia Caragea; Suppawong Tuarob; Alexander G. Ororbia; Douglas Jordan; Prasenjit Mitra; C. Lee Giles
CiteSeerX is a digital library search engine providing access to more than five million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and de-duplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5–6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies implemented in table and algorithm search, which are special search modes in CiteSeerX. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.
acm/ieee joint conference on digital libraries | 2010
Liang Gou; Xiaolong Zhang; Hung-Hsuan Chen; Jung-Hyun Kim; C. Lee Giles
In search engines, ranking algorithms measure the importance and relevance of documents mainly based on the contents and relationships between documents. User attributes are usually not considered in ranking. This user-neutral approach, however, may not meet the diverse interests of users, who may demand different documents even with the same queries. To satisfy this need for more personalized ranking, we propose a ranking framework. Social Network Document Rank (SNDocRank), that considers both document contents and the relationship between a searched and document owners in a social network. This method combined the traditional tf-idf ranking for document contents with out Multi-level Actor Similarity (MAS) algorithm to measure to what extent document owners and the searcher are structurally similar in a social network. We implemented our ranking method in simulated video social network based on data extracted from YouTube and tested its effectiveness on video search. The results show that compared with the traditional ranking method like tf-idfs the SNDocRank algorithm returns more relevant documents. More specifically, a searcher can get significantly better results be being in a larger social network, having more friends, and being associated with larger local communities in a social network.
acm/ieee joint conference on digital libraries | 2014
Zhaohui Wu; Jian Wu; Madian Khabsa; Kyle Williams; Hung-Hsuan Chen; Wenyi Huang; Suppawong Tuarob; Sagnik Ray Choudhury; Alexander G. Ororbia; Prasenjit Mitra; C. Lee Giles
We introduce a Big Data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.
european conference on information retrieval | 2014
Cornelia Caragea; Jian Wu; Alina Ciobanu; Kyle Williams; Juan Fernández-Ramírez; Hung-Hsuan Chen; Zhaohui Wu; C. Lee Giles
The CiteSeer x digital library stores and indexes research articles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeer x has been proven as a powerful resource in many data mining, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeer x is done using automated techniques. Although fairly accurate, these techniques still result in noisy metadata. Since the performance of models trained on these data highly depends on the quality of the data, we propose an approach to CiteSeer x metadata cleaning that incorporates information from an external data source. The result is a subset of CiteSeer x , which is substantially cleaner than the entire set. Our goal is to make the new dataset available to the research community to facilitate future work in Information Retrieval.
data engineering for wireless and mobile access | 2009
Liang Gou; Jung-Hyun Kim; Hung-Hsuan Chen; Jason Collins; Marc Goodman; Xiaolong Zhang; C. Lee Giles
This paper presents MobiSNA -- a mobile video social networking application that supports the exploration, sharing, and creation of video contents through social networks. The MobiSNA project provides the user with an easy to use experience of accessing video content from mobile devices (e.g., mobile phones, PDAs) over wireless broadband networks (e.g., 4G networks). This demo focuses on the key functions of MobiSNA which support social network-based video exploration, real-time video sharing, video blogging, video interest groups, and video story construction. A system architecture of MobiSNA is also proposed.
advances in social networks analysis and mining | 2013
Hung-Hsuan Chen; C. Lee Giles
Discovering similar objects in a social network has many interesting issues. Here, we present ASCOS, an Asymmetric Structure COntext Similarity measure that captures the similarity scores among any pairs of nodes in a network. The definition of ASCOS is similar to that of the well-known SimRank since both define score values recursively. However, we show that ASCOS outputs a more complete similarity score than SimRank because SimRank (and several of its variations, such as P-Rank and SimFusion) on average ignores half paths between nodes during calculation. To make ASCOS tractable in both computation time and memory usage, we propose two variations of ASCOS: a low rank approximation based approach and an iterative solver Gauss-Seidel for linear equations.When the target network is sparse, the run time and the required computing space of these variations are smaller than computing SimRank and ASCOS directly. In addition, the iterative solver divides the original network into several independent sub-systems so that a multi-core server or a distributed computing environment, such as MapReduce, can efficiently solve the problem. We compare the performance of ASCOS with other global structure based similarity measures, including SimRank, Katz, and LHN. The experimental results based on user evaluation suggest that ASCOS gives better results than other measures. In addition, the asymmetric property has the potential to identify the hierarchical structure of a network. Finally, variations of ASCOS (including one distributed variation) can also reduce computation both in space and time.
international world wide web conferences | 2010
Liang Gou; Hung-Hsuan Chen; Jung-Hyun Kim; C. Lee Giles
To improve the search results for socially-connect users, we propose a ranking framework, Social Network Document Rank (SNDocRank). This framework considers both document contents and the similarity between a searcher and document owners in a social network and uses a Multi-level Actor Similarity (MAS) algorithm to efficiently calculate user similarity in a social network. Our experiment results based on YouTube data show that compared with the tf-idf algorithm, the SNDocRank method returns more relevant documents of interest. Our findings suggest that in this framework, a searcher can improve search by joining larger social networks, having more friends, and connecting larger local communities in a social network.
international conference on social computing | 2012
Hung-Hsuan Chen; Liang Gou; Xiaolong Zhang; C. Lee Giles
For social networks, prediction of new links or edges can be important for many reasons, in particular for understanding future network growth. Recent work has shown that graph vertex similarity measures are good at predicting graph link formation for the near future, but are less effective in predicting further out. This could imply that recent links can be more important than older links in link prediction. To see if this is indeed the case, we apply a new relation strength similarity (RSS) measure on a coauthorship network constructed from a subset of the CiteSeerX dataset to study the power of recency. We choose RSS because it is one of the few similarity measures designed for weighted networks and easily models FOAF networks. By assigning different weights to the links according to authors coauthoring history, we show that recency is helpful in predicting the formation of new links.
document engineering | 2014
Kyle Williams; Hung-Hsuan Chen; C. Lee Giles
Source retrieval for plagiarism detection involves using a search engine to retrieve candidate sources of plagiarism for a given suspicious document so that more accurate comparisons can be made. An important consideration is that only documents that are likely to be sources of plagiarism should be retrieved so as to minimize the number of unnecessary comparisons made. A supervised strategy for source retrieval is described whereby search results are classified and ranked as potential sources of plagiarism without retrieving the search result documents and using only the information available at search time. The performance of the supervised method is compared to a baseline method and shown to improve precision by up to 3.28%, recall by up to 2.6% and the F1 score by up to 3.37%. Furthermore, features are analyzed to determine which of them are most important for search result classification with features based on document and search result similarity appearing to be the most important.