James Caverlee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James Caverlee is active.

Explore More

Publication

Featured researches published by James Caverlee.

conference on information and knowledge management | 2010

You are where you tweet: a content-based approach to geo-locating twitter users

Zhiyuan Cheng; James Caverlee; Kyumin Lee

We propose and evaluate a probabilistic framework for estimating a Twitter users city-level location based purely on the content of the users tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a users location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.

international acm sigir conference on research and development in information retrieval | 2010

Uncovering social spammers: social honeypots + machine learning

Kyumin Lee; James Caverlee; Steve Webb

Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure longterm success, we propose and evaluate a honeypot-based approach for uncovering social spammers in online social systems. Two of the key components of the proposed approach are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers. We describe the conceptual framework and design considerations of the proposed approach, and we present concrete observations from the deployment of social honeypots in MySpace and Twitter. We find that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.). Based on these profile features, we develop machine learning based classifiers for identifying previously unknown spammers with high precision and a low rate of false positives.

computational science and engineering | 2009

Ranking Comments on the Social Web

Chiao-Fang Hsu; Elham Khabiri; James Caverlee

We study how an online community perceives the relative quality of its own user-contributed content, which has important implications for the successful self-regulation and growth of the Social Web in the presence of increasing spam and a flood of Social Web metadata. We propose and evaluate a machine learning-based approach for ranking comments on the Social Web based on the communitys expressed preferences, which can be used to promote high-quality comments and filter out low-quality comments. We study several factors impacting community preference, including the contributors reputation and community activity level, as well as the complexity and richness of the comment. Through experiments, we find that the proposed approach results in significant improvement in ranking quality versus alternative approaches.

conference on information and knowledge management | 2013

Location prediction in social media based on tie strength

Jeffrey McGee; James Caverlee; Zhiyuan Cheng

We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation. Concretely, we propose a location estimator -- FriendlyLocation -- that leverages the relationship between the strength of the tie between a pair of users, and the distance between the pair. Based on an examination of over 100 million geo-encoded tweets and 73 million Twitter user profiles, we identify several factors such as the number of followers and how the users interact that can strongly reveal the distance between a pair of users. We use these factors to train a decision tree to distinguish between pairs of users who are likely to live nearby and pairs of users who are likely to live in different areas. We use the results of this decision tree as the input to a maximum likelihood estimator to predict a users location. We find that this proposed method significantly improves the results of location estimation relative to a state-of-the-art technique. Our system reduces the average error distance for 80% of Twitter users from 40 miles to 21 miles using only information from the users friends and friends-of-friends, which has great significance for augmenting traditional social media and enriching location-based services with more refined and accurate location estimates.

Information Sciences | 2010

The SocialTrust framework for trusted social information management: Architecture and algorithms

James Caverlee; Ling Liu; Steve Webb

Social information systems are a promising new paradigm for large-scale distributed information management, as evidenced by the success of large-scale information sharing communities, social media sites, and web-based social networks. But the increasing reliance on these social systems also places individuals and their computer systems at risk, creating opportunities for malicious participants to exploit the tight social fabric of these networks. With these problems in mind, this manuscript presents the SocialTrust framework for enabling trusted social information management in Internet-scale social information systems. Concretely, we study online social networks, consider a number of vulnerabilities inherent in online social networks, and introduce the SocialTrust framework for supporting tamper-resilient trust establishment. We study three key factors for trust establishment in online social networks - trust group feedback, distinguishing the users relationship quality from trust, and tracking user behavior - and describe a principled approach for assessing each component. In addition to the SocialTrust framework, which provides a network-wide perspective on the trust of all users, we describe a personalized extension called mySocialTrust, which provides a user-centric trust perspective that can be optimized for individual users within the network. Finally, we experimentally evaluate the SocialTrust framework using real online social networking data consisting of millions of MySpace profiles and relationships. While other trust aggregation approaches have been developed and implemented by others, we note that it is rare to find such a large-scale experimental evaluation that carefully considers the important factors impacting the trust framework. We find that SocialTrust supports robust trust establishment even in the presence of large-scale collusion by malicious participants.

international world wide web conferences | 2013

Spatio-temporal dynamics of online memes: a study of geo-tagged tweets

Krishna Yeswanth Kamath; James Caverlee; Kyumin Lee; Zhiyuan Cheng

We conduct a study of the spatio-temporal dynamics of Twitter hashtags through a sample of 2 billion geo-tagged tweets. In our analysis, we (i) examine the impact of location, time, and distance on the adoption of hashtags, which is important for understanding meme diffusion and information propagation; (ii) examine the spatial propagation of hashtags through their focus, entropy, and spread; and (iii) present two methods that leverage the spatio-temporal propagation of hashtags to characterize locations. Based on this study, we find that although hashtags are a global phenomenon, the physical distance between locations is a strong constraint on the adoption of hashtags, both in terms of the hashtags shared between locations and in the timing of when these hashtags are adopted. We find both spatial and temporal locality as most hashtags spread over small geographical areas but at high speeds. We also find that hashtags are mostly a local phenomenon with long-tailed life spans. These (and other) findings have important implications for a variety of systems and applications, including targeted advertising, location-based services, social media search, and content delivery networks.

international world wide web conferences | 2010

The social honeypot project: protecting online communities from spammers

Kyumin Lee; James Caverlee; Steve Webb

We present the conceptual framework of the Social Honeypot Project for uncovering social spammers who target online communities and initial empirical results from Twitter and MySpace. Two of the key components of the Social Honeypot Project are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers.

international conference on web services | 2006

Process Mining, Discovery, and Integration using Distance Measures

Joonsoo Bae; Ling Liu; James Caverlee; William B. Rouse

Business processes continue to play an important role in todays service-oriented enterprise computing systems. Mining, discovering, and integrating process-oriented services has attracted growing attention in the recent year. In this paper we present a quantitative approach to modeling and capturing the similarity and dissimilarity between different process designs. We derive the similarity measures by analyzing the process dependency graphs of the participating workflow processes. We first convert each process dependency graph into a normalized process matrix. Then we calculate the metric space distance between the normalized matrices. This distance measure can be used as a quantitative and qualitative tool in process mining, process merging, and process clustering, and ultimately it can reduce or minimize the costs involved in design, analysis, and evolution of workflow systems

International Journal of Web Services Research | 2007

Development of Distance Measures for Process Mining, Discovery and Integration

Joonsoo Bae; Ling Liu; James Caverlee; Liang-Jie Zhang; Hyerim Bae

Business processes continue to play an important role in todayâ€™s service-oriented enterprise computing systems. Mining, discovering, and integrating process-oriented services has attracted growing attention in the recent years. In this article, we present a quantitative approach to modeling and capturing the similarity and dissimilarity between different process designs. We derive the similarity measures by analyzing the process dependency graphs of the participating workflow processes. We first convert each process dependency graph into a normalized process matrix. Then we calculate the metric space distance between the normalized matrices. This distance measure can be used as a quantitative and qualitative tool in process mining, process merging, and process clustering, and ultimately it can reduce or minimize the costs involved in design, analysis, and evolution of workflow systems.

international conference on data engineering | 2004

Probe, cluster, and discover: focused extraction of QA-Pagelets from the deep Web

James Caverlee; Ling Liu; David Buttler

We introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient mining system for discovering and extracting QA-Pagelets from the deep Web. A unique feature of THOR is its two-phase extraction framework. In the first phase, pages from a deep Web site are grouped into distinct clusters of structurally-similar pages. In the second phase, pages from each page cluster are examined through a subtree filtering algorithm that exploits the structural and content similarity at subtree level to identify the QA-Pagelets.

Explore More