Moshe Koppel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Moshe Koppel is active.

Explore More

Publication

Featured researches published by Moshe Koppel.

Literary and Linguistic Computing | 2002

Automatically Categorizing Written Texts by Author Gender

Moshe Koppel; Shlomo Argamon; Anat Rachel Shimoni

The problem of automatically determining the gender of a documents author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98 per cent accuracy.

Communications of The ACM | 2009

Automatically profiling the author of an anonymous text

Shlomo Argamon; Moshe Koppel; James W. Pennebaker; Jonathan Schler

ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others), just by analyzing the given text. This authorship profiling problem is of growing importance in the current global information environment– applications abound in forensics, security, and commercial settings. For example, authorship profiling can help police identify characteristics of the perpetrator of a crime when there are too few (or too many) specific suspects to consider. Similarly, large corporations may be interested in knowing what types of people like or dislike their products, based on analysis of blogs and online product reviews. The question we therefore ask is: How much can we discern about the author of a text simply by analyzing the text itself? It turns out that, with varying degrees of accuracy, we can say a great deal indeed. Unlike the problem of authorship attribution (determining the author of a text from a given candidate set) discussed recently in these pages by Li, Zheng, and Chen authorship profiling does not begin with a set of writing samples from known candidate authors. Instead, we exploit the sociolinguistic observation that different groups of people speaking or writing in a particular genre and in a particular language use that language differently. That is, they vary in how often they use certain words or syntactic constructions (in addition to variation in pronunciation or intonation, for example). The particular profile dimensions we consider here are author gender, age,8 native language7 and personality.10

language resources and evaluation | 2011

Authorship attribution in the wild

Moshe Koppel; Jonathan Schler; Shlomo Argamon

Most previous work on authorship attribution has focused on the case in which we need to attribute an anonymous document to one of a small set of candidate authors. In this paper, we consider authorship attribution as found in the wild: the set of known candidates is extremely large (possibly many thousands) and might not even include the actual author. Moreover, the known texts and the anonymous texts might be of limited length. We show that even in these difficult cases, we can use similarity-based methods along with multiple randomized feature sets to achieve high precision. Moreover, we show the precise relationship between attribution precision and four parameters: the size of the candidate set, the quantity of known-text by the candidates, the length of the anonymous text and a certain robustness score associated with a attribution.

computational intelligence | 2006

THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT

Moshe Koppel; Jonathan Schler

Most research on learning to identify sentiment ignores “neutral” examples, learning only from examples of significant (positive or negative) polarity. We show that it is crucial to use neutral examples in learning polarity for a variety of reasons. Learning from negative and positive examples alone will not permit accurate classification of neutral examples. Moreover, the use of neutral training examples in learning facilitates better distinction between positive and negative examples.

knowledge discovery and data mining | 2005

Determining an author's native language by mining a text for errors

Moshe Koppel; Jonathan Schler; Kfir Zigdon

In this paper, we show that stylistic text features can be exploited to determine an anonymous authors native language with high accuracy. Specifically, we first use automatic tools to ascertain frequencies of various stylistic idiosyncrasies in a text. These frequencies then serve as features for support vector machines that learn to classify texts according to author native language.

Computing Attitude and Affect in Text | 2006

Good News or Bad News? Let the Market Decide

Moshe Koppel; Itai Shtrimberg

A simple and novel method for generating labeled examples for sentiment analysis is introduced: news stories about publicly traded companies are labeled positive or negative according to price changes of the company stock. It is shown that there are many lexical markers for bad news but none for good news. Overall, learned models based on lexical features can distinguish good news from bad news with accuracy of about 70%. Unfortunately, this result does not yield profits since it works only when stories are labeled according to cotemporaneous price changes but does not work when they are labeled according to subsequent price changes.

international acm sigir conference on research and development in information retrieval | 2006

Authorship attribution with thousands of candidate authors

Moshe Koppel; Jonathan Schler; Shlomo Argamon; Eran Messeri

In this paper, we use a blog corpus to demonstrate that we can often identify the author of an anonymous text even where there are many thousands of candidate authors. Our approach combines standard information retrieval methods with a text categorization meta-learning scheme that determines when to even venture a guess.

Knowledge and Information Systems | 2001

Arbitrating among competing classifiers using learned referees

Julio Ortega; Moshe Koppel; Shlomo Argamon

Abstract. The situation in which the results of several different classifiers and learning algorithms are obtainable for a single classification problem is common. In this paper, we propose a method that takes a collection of existing classifiers and learning algorithms, together with a set of available data, and creates a combined classifier that takes advantage of all of these sources of knowledge. The basic idea is that each classifier has a particular subdomain for which it is most reliable. Therefore, we induce a referee for each classifier, which describes its area of expertise. Given such a description, we arbitrate between the component classifiers by using the most reliable classifier for the examples in each subdomain. In experiments in several domains, we found such arbitration to be significantly more effective than various voting techniques which do not seek out subdomains of expertise. Our results further suggest that the more fine grained the analysis of the areas of expertise of the competing classifiers, the more effectively they can be combined. In particular, we find that classification accuracy increases greatly when using intermediate subconcepts from the classifiers themselves as features for the induction of referees.

Bulletin of Mathematical Biology | 1990

The cellular computer DNA: Program or data

Henri Atlan; Moshe Koppel

The classical metaphor of the genetic program written in the DNA nucleotidic sequences is reconsidered. Recent works on algorithmic complexity and logical properties of computer programs and data are used to question the explanatory value of that metaphor. Structural properties of strings are looked for which would be necessary to apply to DNA sequences if the metaphor is to be taken literally. The notion of sophistication is used to quantify meaningful complexity and to distinguish it from classical computational complexity. In this context, the distinction between program and data becomes relevant and an alternative metaphor of DNA as data to a parallel computing network embedded in the global geometrical and biochemical structure of the cell is discussed. An intermediate picture of an evolving network emerges as the most likely where the output of the cellular computing network can produce, at a different time scale, changes in the structure of the network itself by means of changes in the DNA activity patterns.

Journal of the Association for Information Science and Technology | 2014

Determining if two documents are written by the same author

Moshe Koppel; Yaron Winter

Almost any conceivable authorship attribution problem can be reduced to one fundamental problem: whether a pair of (possibly short) documents were written by the same author. In this article, we offer an (almost) unsupervised method for solving this problem with surprisingly high accuracy. The main idea is to use repeated feature subsampling methods to determine if one document of the pair allows us to select the other from among a background set of “impostors” in a sufficiently robust manner.

Explore More