Matthew S. Gerber | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew S. Gerber is active.

Explore More

Publication

Featured researches published by Matthew S. Gerber.

decision support systems | 2014

Predicting crime using Twitter and kernel density estimation

Matthew S. Gerber

Twitter is used extensively in the United States as well as globally, creating many opportunities to augment decision support systems with Twitterdriven predictive analytics. Twitter is an ideal data source for decision support: its users, who number in the millions, publicly discuss events, emotions, and innumerable other topics; its content is authored and distributed in real time at no charge; and individual messages (also known as tweets) are often tagged with precise spatial and temporal coordinates. This article presents research investigating the use of spatiotemporally tagged tweets for crime prediction. We use Twitter-specific linguistic analysis and statistical topic modeling to automatically identify discussion topics across a major city in the United States. We then incorporate these topics into a crime prediction model and show that, for 19 of the 25 crime types we studied, the addition of Twitter data improves crime prediction performance versus a standard approach based on kernel density estimation. We identify a number of performance bottlenecks that could impact the use of Twitter in an actual decision support system. We also point out important areas of future work for this research, including deeper semantic analysis of message con

international conference on social computing | 2012

Automatic crime prediction using events extracted from twitter posts

Xiaofeng Wang; Matthew S. Gerber; Donald E. Brown

Prior work on criminal incident prediction has relied primarily on the historical crime record and various geospatial and demographic information sources. Although promising, these models do not take into account the rich and rapidly expanding social media context that surrounds incidents of interest. This paper presents a preliminary investigation of Twitter-based criminal incident prediction. Our approach is based on the automatic semantic analysis and understanding of natural language Twitter posts, combined with dimensionality reduction via latent Dirichlet allocation and prediction via linear modeling. We tested our model on the task of predicting future hit-and-run crimes. Evaluation results indicate that the model comfortably outperforms a baseline model that predicts hit-and-run incidents uniformly across all days.

Computational Linguistics | 2012

Semantic role labeling of implicit arguments for nominal predicates

Matthew S. Gerber; Joyce Y. Chai

Nominal predicates often carry implicit arguments. Recent work on semantic role labeling has focused on identifying arguments within the local context of a predicate; implicit arguments, however, have not been systematically examined. To address this limitation, we have manually annotated a corpus of implicit arguments for ten predicates from NomBank. Through analysis of this corpus, we find that implicit arguments add 71% to the argument structures that are present in NomBank. Using the corpus, we train a discriminative model that is able to identify implicit arguments with an F1 score of 50%, significantly outperforming an informed baseline model. This article describes our investigation, explores a wide variety of features important for the task, and discusses future directions for work on implicit argument identification.

intelligence and security informatics | 2012

Spatio-temporal modeling of criminal incidents using geographic, demographic, and twitter-derived information

Xiaofeng Wang; Donald E. Brown; Matthew S. Gerber

Personal and property crimes create large economic losses within the United States. To prevent crimes, law enforcement agencies model the spatio-temporal pattern of criminal incidents. In this paper, we present a new modeling process that combines two of our recently developed approaches for modeling criminal incidents. The first component of the process is the spatio-temporal generalized additive model (STGAM), which predicts the probability of criminal activity at a given location and time using a feature-based approach. The second component involves textual analysis. In our experiments, we automatically analyzed Twitter posts, which provide a rich, event-based context for criminal incidents. In addition, we describe a new feature selection method to identify important features. We applied our new model to actual criminal incidents in Charlottesville, Virginia. Our results indicate that the STGAM/Twitter model outperforms our previous STGAM model, which did not use Twitter information. The STGAM/Twitter model can be generalized to other applications of event modeling where unstructured text is available.

Security Informatics | 2014

Automatic detection of cyber-recruitment by violent extremists

Jacob R. Scanlon; Matthew S. Gerber

Growing use of the Internet as a major means of communication has led to the formation of cyber-communities, which have become increasingly appealing to terrorist groups due to the unregulated nature of Internet communication. Online communities enable violent extremists to increase recruitment by allowing them to build personal relationships with a worldwide audience capable of accessing uncensored content. This article presents methods for identifying the recruitment activities of violent groups within extremist social media websites. Specifically, these methods apply known techniques within supervised learning and natural language processing to the untested task of automatically identifying forum posts intended to recruit new violent extremist members. We used data from the western jihadist website Ansar AlJihad Network, which was compiled by the University of Arizona’s Dark Web Project. Multiple judges manually annotated a sample of these data, marking 192 randomly sampled posts as recruiting (Yes) or non-recruiting (No). We observed significant agreement between the judges’ labels; Cohen’s κ=(0.5,0.9) at p=0.01. We tested the feasibility of using naive Bayes models, logistic regression, classification trees, boosting, and support vector machines (SVM) to classify the forum posts. Evaluation with receiver operating characteristic (ROC) curves shows that our SVM classifier achieves an 89% area under the curve (AUC), a significant improvement over the 63% AUC performance achieved by our simplest naive Bayes model (Tukey’s test at p=0.05). To our knowledge, this is the first result reported on this task, and our analysis indicates that automatic detection of online terrorist recruitment is a feasible task. We also identify a number of important areas of future work including classifying non-English posts and measuring how recruitment posts and current events change membership numbers over time.

ubiquitous computing | 2016

Sensus: a cross-platform, general-purpose system for mobile crowdsensing in human-subject studies

Haoyi Xiong; Yu Huang; Laura E. Barnes; Matthew S. Gerber

The burden of entry into mobile crowdsensing (MCS) is prohibitively high for human-subject researchers who lack a technical orientation. As a result, the benefits of MCS remain beyond the reach of research communities (e.g., psychologists) whose expertise in the study of human behavior might advance applications and understanding of MCS systems. This paper presents Sensus, a new MCS system for human-subject studies that bridges the gap between human-subject researchers and MCS methods. Sensus alleviates technical burdens with on-device, GUI-based design of sensing plans, simple and efficient distribution of sensing plans to study participants, and uniform participant experience across iOS and Android devices. Sensing plans support many hardware and software sensors, automatic deployment of sensor-triggered surveys, and double-blind assignment of participants within randomized controlled trials. Sensus offers these features to study designers without requiring knowledge of markup and programming languages. We demonstrate the feasibility of using Sensus within two human-subject studies, one in psychology and one in engineering. Feedback from non-technical users indicates that Sensus is an effective and low-burden system for MCS-based data collection and analysis.

systems and information engineering design symposium | 2014

Automated prediction of adverse post-surgical outcomes

Katharine Hergenroeder; Timothy Carroll; Alec Chen; Caroline Iurillo; Peter T. W. Kim; Zachary Terner; Matthew S. Gerber; Donald E. Brown

Patients undergoing surgery can experience a range of adverse events, such as renal and cardiac injury, respiratory failure, and death. This study focuses on discovering relationships between perioperative physiological data and adverse post-surgical outcomes, with the goal of developing strategies to reduce the severity and frequency of these conditions. Analyzing the patients preoperative demographic data, such as age and race, and perioperative physiologic data, such as blood pressure and anesthesia dosage, we use statistical models to predict whether a patient under anesthesia will develop renal or cardiac injury, respiratory failure, or death. Specifically, we compare generalized linear models, random forest models, and L1 regularized logistic regression models in predicting these adverse events. For each event, the random forest model generally outperformed its competitors, as shown in receiver operating characteristic (ROC) curves and evidenced by the higher area under the curve (AUC) values of 0.85, 0.86, 0.85, and 0.82 for death, renal injury, respiratory failure, and cardiac injury, respectively. However, score tables indicate that at certain thresholds, the L1 regularized logistic regression predicts fewer false negatives than the random forest models. In general, our findings show the existence of a relationship between perioperative predictors and post-surgical complications. This relationship could provide the foundation for a surveillance and alert system.

international joint conference on natural language processing | 2015

Model Adaptation for Personalized Opinion Analysis

Mohammad Al Boni; Keira Qi Zhou; Hongning Wang; Matthew S. Gerber

Humans are idiosyncratic and variable: towards the same topic, they might hold different opinions or express the same opinion in various ways. It is hence important to model opinions at the level of individual users; however it is impractical to estimate independent sentiment classification models for each user with limited data. In this paper, we adopt a modelbased transfer learning solution – using linear transformations over the parameters of a generic model – for personalized opinion analysis. Extensive experimental results on a large collection of Amazon reviews confirm our method significantly outperformed a user-independent generic opinion model as well as several state-ofthe-art transfer learning algorithms.

IEEE Transactions on Information Forensics and Security | 2015

Forecasting Violent Extremist Cyber Recruitment

Jacob R. Scanlon; Matthew S. Gerber

The Internets increasing use as a means of communication has led to the formation of cyber communities, which have become appealing to violent extremist (VE) groups. This paper presents research on forecasting the daily level of cyber-recruitment activity of VE groups. We used a previously developed support vector machine model to identify recruitment posts within a Western jihadist discussion forum. We analyzed the textual content of this data set with latent Dirichlet allocation (LDA), and we fed these analyses into a variety of time series models to forecast cyber-recruitment activity within the forum. Quantitative evaluations showed that employing LDA-based topics as predictors within time series models reduces forecast error compared with naive (random-walk), autoregressive integrated moving average, and exponential smoothing baselines. To the best of our knowledge, this is the first result reported on this forecasting task. This research could ultimately help assist with efficient allocation of intelligence analysts in response to predicted levels of cyber-recruitment activity.

systems and information engineering design symposium | 2013

Assessment of machine learning algorithms in cloud computing frameworks

Kevin Li; Charles Gibson; David Ho; Qi Zhou; Jason Kim; Omar Buhisi; Donald E. Brown; Matthew S. Gerber

In the past decade, digitization of information has led to a data explosion in both volume and complexity. While traditional computing frameworks have failed to provide adequate computing power for the now common data-intensive computing tasks, cloud computing provides an effective alternative to enhance computing power. Machine learning algorithms are powerful analytical methods that allow machines to recognize patterns and facilitate human learning. However, the performance of individual machine learning algorithms within each cloud computing framework remains largely unknown. Furthermore, the lack of a robust selection methodology matching input data with effective machine learning algorithms limits the ability of practitioners to make effective use of cloud computing. This research compares various machine learning algorithms on the widely adopted Apache Mahout framework and the recently introduced GraphLab framework. Whereas previous work has examined the computational architectures of various cloud computing frameworks, this work focuses on a problem-based approach to architecture selection. The experimental results demonstrate that GraphLab generally outperforms Mahout with respect to runtime, scalability, and usability. However, Mahout outperforms GraphLab when the experiment focus shifts to error measurement.

Explore More