Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eric M. Clark is active.

Publication


Featured researches published by Eric M. Clark.


Proceedings of the National Academy of Sciences of the United States of America | 2015

Human language reveals a universal positivity bias

Peter Sheridan Dodds; Eric M. Clark; Suma Desu; Morgan R. Frank; Andrew J. Reagan; Jake Ryland Williams; Lewis Mitchell; Kameron Decker Harris; Isabel M. Kloumann; James P. Bagrow; Karine Megerdoomian; Matthew T. McMahon; Brian F. Tivnan; Christopher M. Danforth

Significance The most commonly used words of 24 corpora across 10 diverse human languages exhibit a clear positive bias, a big data confirmation of the Pollyanna hypothesis. The study’s findings are based on 5 million individual human scores and pave the way for the development of powerful language-based tools for measuring emotion. Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.


PLOS ONE | 2016

Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter.

Eric M. Clark; Christopher A. Jones; Jake Ryland Williams; Allison N. Kurti; Mitchell C. Norotsky; Christopher M. Danforth; Peter Sheridan Dodds

Background Twitter has become the “wild-west” of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, “kid-friendly” flavors, algorithmically generated false testimonials, and free samples. Methods All electronic cigarette keyword related tweets from a 10% sample of Twitter spanning January 2012 through December 2014 (approximately 850,000 total tweets) were identified and categorized as Automated or Organic by combining a keyword classification and a machine trained Human Detection algorithm. A sentiment analysis using Hedonometrics was performed on Organic tweets to quantify the change in consumer sentiments over time. Commercialized tweets were topically categorized with key phrasal pattern matching. Results The overwhelming majority (80%) of tweets were classified as automated or promotional in nature. The majority of these tweets were coded as commercialized (83.65% in 2013), up to 33% of which offered discounts or free samples and appeared on over a billion twitter feeds as impressions. The positivity of Organic (human) classified tweets has decreased over time (5.84 in 2013 to 5.77 in 2014) due to a relative increase in the negative words ‘ban’, ‘tobacco’, ‘doesn’t’, ‘drug’, ‘against’, ‘poison’, ‘tax’ and a relative decrease in the positive words like ‘haha’, ‘good’, ‘cool’. Automated tweets are more positive than organic (6.17 versus 5.84) due to a relative increase in the marketing words like ‘best’, ‘win’, ‘buy’, ‘sale’, ‘health’, ‘discount’ and a relative decrease in negative words like ‘bad’, ‘hate’, ‘stupid’, ‘don’t’. Conclusions Due to the youth presence on Twitter and the clinical uncertainty of the long term health complications of electronic cigarette consumption, the protection of public health warrants scrutiny and potential regulation of social media marketing.


Journal of Computational Science | 2016

Sifting Robotic from Organic Text: A Natural Language Approach for Detecting Automation on Twitter

Eric M. Clark; Jake Ryland Williams; Christopher A. Jones; Richard A. Galbraith; Christopher M. Danforth; Peter Sheridan Dodds

Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage meta-data (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twitter-sphere.


Scientific Reports | 2015

Zipf’s law holds for phrases, not words

Jake Ryland Williams; Paul R. Lessard; Suma Desu; Eric M. Clark; James P. Bagrow; Christopher M. Danforth; Peter Sheridan Dodds

With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf’s law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.


Expert Review of Pharmacoeconomics & Outcomes Research | 2016

Prevention of treatment-related fluid overload reduces estimated effective cost of prothrombin complex concentrate in patients requiring rapid vitamin K antagonist reversal

Christopher A. Jones; Katrina Ducis; Jeffrey Petrozzino; Eric M. Clark; Mark K. Fung; Christian Peters; Indra Neil Sarkar; Emilia Krol; Brina Pochal; Amanda Boutrus; Peter Weimersheimer; Kalev Freeman

Introduction: Fresh frozen plasma (FFP) is a frequently used human blood product to reverse the effects of vitamin K antagonists. While FFP is relatively economical, its large fluid volume can lead to hospitalization complications, therefore increasing the overall cost of use. Materials & methods: A recently published article by Sarode et al., in Circulation, described the rate of volume overload associated with FFP use for reversal of vitamin K antagonists. This condition, described as transfusion-associated circulatory overload, has a defined rate of intensive care admission, which also has a well-reported average cost. The additional monetary value of intensive care unit admission and caring for fluid overload is then compared to the cost of another product, four-factor prothrombin complex concentrates, which does not, as per the Sarode paper, result in fluid overload. Results: The increased costs attributed to FFP-associated fluid overload for vitamin K antagonist reversal partly defrays the increased upfront cost of four-factor prothrombin complex concentrates. Discussion: FFP is commonly used to acutely reverse the effects of vitamin K antagonists. However, its use requires significant time for infusion, may lead to fluid overload, and is not fully effective in compete anticoagulation reversal. One alternative therapy for anticoagulant reversal is use of prothrombin complex concentrates, which are rapidly infused, are not associated with fluid overload, and are effective in complete reversal of coagulation measurements. This should be considered for patients with acute bleeding emergencies.


Physical Review E | 2015

Identifying missing dictionary entries with frequency-conserving context models.

Jake Ryland Williams; Eric M. Clark; James P. Bagrow; Christopher M. Danforth; Peter Sheridan Dodds

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary, an extensive, online, collaborative, and open-source dictionary that contains over 100000 phrasal definitions, we develop highly effective filters for the identification of meaningful, missing phrase entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique and expanding our knowledge of the defined English lexicon of phrases.


Proceedings of the National Academy of Sciences of the United States of America | 2015

Reply to Garcia et al.: Common mistakes in measuring frequency-dependent word characteristics.

Peter Sheridan Dodds; Eric M. Clark; Suma Desu; Morgan R. Frank; Andrew J. Reagan; Jake Ryland Williams; Lewis Mitchell; Kameron Decker Harris; Isabel M. Kloumann; James P. Bagrow; Karine Megerdoomian; Matthew T. McMahon; Brian F. Tivnan; Christopher M. Danforth

The concerns expressed by Garcia et al. (1) are misplaced due to a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists such as LIWC (Linguist Inquiry and Word Count) (2). We provide a complete response in our papers online appendices (3). Garcia et al. (1) suggest that the set of function words in the LIWC dataset (2) show a wide spectrum of average happiness with positive skew (figure 1A in ref. 1) when, according to their interpretation, these words should exhibit a Dirac δ function located at neutral (havg = 5 on a 1–9 scale). However, many words tagged as function words in the LIWC dataset readily elicit an emotional response in raters as exemplified by “greatest” (havg = 7.26), “best” (havg = 7.26), “negative” (havg = 2.42), and “worst” (havg = 2.10). In our study (3), basic function words that are expected to be neutral, such as “the” (havg = 4.98) and “to” (havg = 4.98), were appropriately scored as such. Moreover, no meaningful statement about biases can be made for sets of words chosen without frequency of use properly incorporated.


International Journal of Web Information Systems | 2015

Measuring climate change on Twitter using Google’s algorithm: perception and events

Ahmed Abdeen Hamed; Alexa A. Ayer; Eric M. Clark; Erin A. Irons; Grant T. Taylor; Asim Zia

Purpose – The purpose of this paper is to test the hypothesis of whether more complex and emergent hashtags can be sufficient pointers to climate change events. Human-induced climate change is one of this century’s greatest unbalancing forces to have affected our planet. Capturing the public awareness of climate change on Twitter has proven to be significant. In a previous research, it was demonstrated by the authors that public awareness is prominently expressed in the form of hashtags that uses more than one bigram (i.e. a climate change term). The research finding showed that this awareness is expressed by more complex terms (e.g. “climate change”). It was learned that the awareness was dominantly expressed using the hashtag: #ClimateChange. Design/methodology/approach – The methods demonstrated here use objective computational approaches [i.e. Google’s ranking algorithm and Information Retrieval measures (e.g. TFIDF)] to detect and rank the emerging events. Findings – The results shows a clear signifi...


Journal of Surgical Research | 2016

A pattern-matched Twitter analysis of US cancer-patient sentiments.

W. Christian Crannell; Eric M. Clark; Christopher A. Jones; Ted A. James; Jesse Moore


arXiv: Computation and Language | 2018

A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter.

Eric M. Clark; Ted A. James; Christopher A. Jones; Amulya Alapati; Promise O. Ukandu; Christopher M. Danforth; Peter Sheridan Dodds

Collaboration


Dive into the Eric M. Clark's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Suma Desu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge