Justin Martineau | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Justin Martineau is active.

Explore More

Publication

Featured researches published by Justin Martineau.

conference on information and knowledge management | 2009

Improving binary classification on text problems using differential word features

Justin Martineau; Tim Finin; Anupam Joshi; Shamit Patel

We describe an efficient technique to weigh word-based features in binary classification tasks and show that it significantly improves classification accuracy on a range of problems. The most common text classification approach uses a documents ngrams (words and short phrases) as its features and assigns feature values equal to their frequency or TFIDF score relative to the training corpus. Our approach uses values computed as the product of an ngrams document frequency and the difference of its inverse document frequencies in the positive and negative training sets. While this technique is remarkably easy to implement, it gives a statistically significant improvement over the standard bag-of-words approaches using support vector machines on a range of classification tasks. Our results show that our technique is robust and broadly applicable. We provide an analysis of why the approach works and how it can generalize to other domains and problems.

computational science and engineering | 2009

The Geolocation of Web Logs from Textual Clues

Clayton Fink; Christine D. Piatko; James Mayfield; Danielle Chou; Tim Finin; Justin Martineau

Understanding the spatial distribution of people who author social media content is of growing interest for researchers and commerce. Blogging platforms depend on authors reporting their own location. However, not all authors report or reveal their location on their blog’s home page. Automated geolocation strategies using IP address and domain name are not adequate for determining an author’s location because most blogs are not self-hosted. In this paper we describe a method that uses the place name mentions in a blog to determine an author’s location. We achieved an accuracy of 63% on a collection of 844 blogs with known locations.

north american chapter of the association for computational linguistics | 2016

Clustering for Simultaneous Extraction of Aspects and Features from Reviews

Lu Chen; Justin Martineau; Doreen Cheng; Amit P. Sheth

This paper presents a clustering approach that simultaneously identifies product features and groups them into aspect categories from online reviews. Unlike prior approaches that first extract features and then group them into categories, the proposed approach combines feature and aspect discovery instead of chaining them. In addition, prior work on feature extraction tends to require seed terms and focus on identifying explicit features, while the proposed approach extracts both explicit and implicit features, and does not require seed terms. We evaluate this approach on reviews from three domains. The results show that it outperforms several state-of-the-art methods on both tasks across all three domains.

north american chapter of the association for computational linguistics | 2015

Samsung: Align-and-Differentiate Approach to Semantic Textual Similarity

Lushan Han; Justin Martineau; Doreen Cheng; Christopher Thomas

This paper describes our Align-andDifferentiate approach to the SemEval 2015 Task 2 competition for English Semantic Textual Similarity (STS) systems. Our submission achieved the top place on two of the five evaluation datasets. Our team placed 3rd among 28 participating teams, and our three runs ranked 4th, 6th and 7th among the 73 runs submitted by the 28 teams. Our approach improves upon the UMBC PairingWords system by semantically differentiating distributionally similar terms. This novel addition improves results by 2.5 points on the Pearson correlation measure.

meeting of the association for computational linguistics | 2014

Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

Justin Martineau; Lu Chen; Doreen Cheng; Amit P. Sheth

Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. Eight different emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computationally expensive techniques. Our techniques save a considerable amount of time.

machine learning and data mining in pattern recognition | 2013

TISA: topic independence scoring algorithm

Justin Martineau; Doreen Cheng; Tim Finin

Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs. To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.

international health informatics symposium | 2012

Sub-cellular feature detection and automated extraction of collocalized actin and myosin regions

Justin Martineau; Ronil Mokashi; David R. Chapman; Michael A. Grasso; Mary Brady; Yelena Yesha; Yaacov Yesha; Antonio Cardone; Alden A. Dima

We describe a new distance-based metric to measure the strength of collocalization in multi-color microscopy images for user-selected regions. This metric helps to standardize, objectify, quantify, and even automate light microscopy observations. Our new algorithm uses this metric to automatically identify and annotate a donut shaped actomyosin stress fiber bundle evident in vascular smooth muscle cells on certain types of surfaces. Both the metric and the algorithm have been implemented as an open source plugin for the popular ImageJ toolkit. They are available for download at http://code.google.com/p/actin-myosin-plugin/. Using cells stained for the cytoskeletal proteins actin and myosin, we show how characteristics of the identified stress fiber bundle are indicative of the kind of surface the cell is placed upon, and prove that weak spots in this structure are correlated with local membrane extensions. Given the relationship between membrane extension, cell migration, vascular disease, embryonic development, and cancer metastasis we provide that these tools to enable biological research that could improve our quality of life.

north american chapter of the association for computational linguistics | 2010