Kyumin Lee
Utah State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kyumin Lee.
conference on information and knowledge management | 2010
Zhiyuan Cheng; James Caverlee; Kyumin Lee
We propose and evaluate a probabilistic framework for estimating a Twitter users city-level location based purely on the content of the users tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a users location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.
international acm sigir conference on research and development in information retrieval | 2010
Kyumin Lee; James Caverlee; Steve Webb
Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure longterm success, we propose and evaluate a honeypot-based approach for uncovering social spammers in online social systems. Two of the key components of the proposed approach are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers. We describe the conceptual framework and design considerations of the proposed approach, and we present concrete observations from the deployment of social honeypots in MySpace and Twitter. We find that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.). Based on these profile features, we develop machine learning based classifiers for identifying previously unknown spammers with high precision and a low rate of false positives.
international world wide web conferences | 2013
Krishna Yeswanth Kamath; James Caverlee; Kyumin Lee; Zhiyuan Cheng
We conduct a study of the spatio-temporal dynamics of Twitter hashtags through a sample of 2 billion geo-tagged tweets. In our analysis, we (i) examine the impact of location, time, and distance on the adoption of hashtags, which is important for understanding meme diffusion and information propagation; (ii) examine the spatial propagation of hashtags through their focus, entropy, and spread; and (iii) present two methods that leverage the spatio-temporal propagation of hashtags to characterize locations. Based on this study, we find that although hashtags are a global phenomenon, the physical distance between locations is a strong constraint on the adoption of hashtags, both in terms of the hashtags shared between locations and in the timing of when these hashtags are adopted. We find both spatial and temporal locality as most hashtags spread over small geographical areas but at high speeds. We also find that hashtags are mostly a local phenomenon with long-tailed life spans. These (and other) findings have important implications for a variety of systems and applications, including targeted advertising, location-based services, social media search, and content delivery networks.
intelligent user interfaces | 2014
Kyumin Lee; Jalal Mahmud; Jilin Chen; Michelle X. Zhou; Jeffrey Nichols
There has been much effort on studying how social media sites, such as Twitter, help propagate information in different situations, including spreading alerts and SOS messages in an emergency. However, existing work has not addressed how to actively identify and engage the right strangers at the right time on social media to help effectively propagate intended information within a desired time frame. To ad-dress this problem, we have developed two models: (i) a feature-based model that leverages peoplesfi exhibited social behavior, including the content of their tweets and social interactions, to characterize their willingness and readiness to propagate information on Twitter via the act of retweeting; and (ii) a wait-time model based on a users previous retweeting wait times to predict her next retweeting time when asked. Based on these two models, we build a recommender system that predicts the likelihood of a stranger to retweet information when asked, within a specific time window, and recommends the top-N qualified strangers to engage with. Our experiments, including live studies in the real world, demonstrate the effectiveness of our work.
international world wide web conferences | 2010
Kyumin Lee; James Caverlee; Steve Webb
We present the conceptual framework of the Social Honeypot Project for uncovering social spammers who target online communities and initial empirical results from Twitter and MySpace. Two of the key components of the Social Honeypot Project are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers.
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality | 2012
Kyumin Lee; James Caverlee; Krishna Yeswanth Kamath; Zhiyuan Cheng
We examine the problem of collective attention spam, in which spammers target social media where user attention quickly coalesces and then collectively focuses around a phenomenon. Compared to many existing spam types, collective attention spam relies on the users themselves to seek out the content -- like breaking news, viral videos, and popular memes -- where the spam will be encountered, potentially increasing its effectiveness and reach. We study the presence of collective attention spam in one popular service, Twitter, and we develop spam classifiers to detect spam messages generated by collective attention spammers. Since many instances of collective attention are bursty and unexpected, it is difficult to build spam detectors to pre-screen them before they arise; hence, we examine the effectiveness of quickly learning a classifier based on the first moments of a bursting phenomenon. Through initial experiments over a small set of trending topics on Twitter, we find encouraging results, suggesting that collective attention spam may be identified early in its life cycle and shielded from the view of unsuspecting social media users.
international acm sigir conference on research and development in information retrieval | 2015
Amir Fayazi; Kyumin Lee; James Caverlee; Anna Cinzia Squicciarini
Online reviews are a cornerstone of consumer decision making. However, their authenticity and quality has proven hard to control, especially as polluters target these reviews toward promoting products or in degrading competitors. In a troubling direction, the widespread growth of crowdsourcing platforms like Mechanical Turk has created a large-scale, potentially difficult-to-detect workforce of malicious review writers. Hence, this paper tackles the challenge of uncovering crowdsourced manipulation of online reviews through a three-part effort: (i) First, we propose a novel sampling method for identifying products that have been targeted for manipulation and a seed set of deceptive reviewers who have been enlisted through crowdsourcing platforms. (ii) Second, we augment this base set of deceptive reviewers through a reviewer-reviewer graph clustering approach based on a Markov Random Field where we define individual potentials (of single reviewers) and pair potentials (between two reviewers). (iii) Finally, we embed the results of this probabilistic model into a classification framework for detecting crowd-manipulated reviews. We find that the proposed approach achieves up to 0.96 AUC, outperforming both traditional detection methods and a SimRank-based alternative clustering approach.
ACM Transactions on Intelligent Systems and Technology | 2015
Kyumin Lee; Jalal Mahmud; Jilin Chen; Michelle X. Zhou; Jeffrey Nichols
There has been much effort on studying how social media sites, such as Twitter, help propagate information in different situations, including spreading alerts and SOS messages in an emergency. However, existing work has not addressed how to actively identify and engage the right strangers at the right time on social media to help effectively propagate intended information within a desired time frame. To address this problem, we have developed three models: (1) a feature-based model that leverages peoples exhibited social behavior, including the content of their tweets and social interactions, to characterize their willingness and readiness to propagate information on Twitter via the act of retweeting; (2) a wait-time model based on a users previous retweeting wait times to predict his or her next retweeting time when asked; and (3) a subset selection model that automatically selects a subset of people from a set of available people using probabilities predicted by the feature-based model and maximizes retweeting rate. Based on these three models, we build a recommender system that predicts the likelihood of a stranger to retweet information when asked, within a specific time window, and recommends the top-N qualified strangers to engage with. Our experiments, including live studies in the real world, demonstrate the effectiveness of our work.
ACM Transactions on Intelligent Systems and Technology | 2013
Zhiyuan Cheng; James Caverlee; Kyumin Lee
Highly dynamic real-time microblog systems have already published petabytes of real-time human sensor data in the form of status updates. However, the lack of user adoption of geo-based features per user or per post signals that the promise of microblog services as location-based sensing systems may have only limited reach and impact. Thus, in this article, we propose and evaluate a probabilistic framework for estimating a microblog users location based purely on the content of the users posts. Our framework can overcome the sparsity of geo-enabled features in these services and bring augmented scope and breadth to emerging location-based personalized information services. Three of the key features of the proposed approach are: (i) its reliance purely on publicly available content; (ii) a classification component for automatically identifying words in posts with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a users location estimate. On average we find that the location estimates converge quickly, placing 51% of users within 100 miles of their actual location.
conference on information and knowledge management | 2011
Zhiyuan Cheng; James Caverlee; Krishna Yeswanth Kamath; Kyumin Lee
The emergence of location sharing services is rapidly accelerating the convergence of our online and offline activities. In one direction, Foursquare, Google Latitude, Facebook Places, and related services are enriching real-world venues with the social and semantic connections among online users. In analogy to how clickstreams have been successfully incorporated into traditional web ranking based on content and link analysis, we propose to mine traffic patterns revealed through location sharing services to augment traditional location-based search. Concretely, we study location-based traffic patterns revealed through location sharing services and find that these traffic patterns can identify semantically related locations. Based on this observation, we propose and evaluate a traffic-driven location clustering algorithm that can group semantically related locations with high confidence. Through experimental study of 12 million locations from Foursquare, we extend this result through supervised location categorization, wherein traffic patterns can be used to accurately predict the semantic category of uncategorized locations. Based on these results, we show how traffic-driven semantic organization of locations may be naturally incorporated into location-based web search.