Daniel Ramage | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Ramage is active.

Explore More

Publication

Featured researches published by Daniel Ramage.

empirical methods in natural language processing | 2009

Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

Daniel Ramage; David Leo Wright Hall; Ramesh Nallapati; Christopher D. Manning

A significant portion of the worlds text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDAs latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDAs improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on a variety of datasets.

web search and data mining | 2011

#TwitterSearch: a comparison of microblog search and web search

Jaime Teevan; Daniel Ramage; Merredith Ringel Morris

Social networking Web sites are not just places to maintain relationships; they can also be valuable information sources. However, little is known about how and why people search socially-generated content. In this paper we explore search behavior on the popular microblogging/social networking site Twitter. Using analysis of large-scale query logs and supplemental qualitative data, we observe that people search Twitter to find temporally relevant information (e.g., breaking news, real-time content, and popular trends) and information related to people (e.g., content directed at the searcher, information about people of interest, and general sentiment and opinion). Twitter queries are shorter, more popular, and less likely to evolve as part of a session than Web queries. It appears people repeat Twitter queries to monitor the associated search results, while changing and developing Web queries to learn about a topic. The results returned from the different corpora support these different uses, with Twitter results including more social chatter and social events, and Web results containing more basic facts and navigational content. We discuss the implications of these findings for the design of next-generation Web search tools that incorporate social media.

web search and data mining | 2009

Clustering the tagged web

Daniel Ramage; Paul Heymann; Christopher D. Manning; Hector Garcia-Molina

Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from large-scale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel generative clustering algorithm based on latent Dirichlet allocation that jointly models text and tags. We evaluate the models by comparing their output to an established web directory. We find that the naive inclusion of tagging data improves cluster quality versus page text alone, but a more principled inclusion can substantially improve the quality of all models with a statistically significant absolute F-score increase of 4%. The generative model outperforms K-means with another 8% F-score increase.

graph based methods for natural language processing | 2009

WikiWalk: Random walks on Wikipedia for Semantic Relatedness

Eric Yeh; Daniel Ramage; Christopher D. Manning; Eneko Agirre; Aitor Soroa

Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.

meeting of the association for computational linguistics | 2007

Learning Alignments and Leveraging Natural Logic

Nathanael Chambers; Daniel M. Cer; Trond Grenager; David Leo Wright Hall; Chloé Kiddon; Bill MacCartney; Marie-Catherine de Marneffe; Daniel Ramage; Eric Yeh; Christopher D. Manning

We describe an approach to textual inference that improves alignments at both the typed dependency level and at a deeper semantic level. We present a machine learning approach to alignment scoring, a stochastic search procedure, and a new tool that finds deeper semantic alignments, allowing rapid development of semantic features over the aligned graphs. Further, we describe a complementary semantic component based on natural logic, which shows an added gain of 3.13% accuracy on the RTE3 test set.

computer and communications security | 2017

Practical Secure Aggregation for Privacy-Preserving Machine Learning

Keith Allen Bonawitz; Vladimir Ivanov; Ben Kreuter; Antonio Marcedone; H. Brendan McMahan; Sarvar Patel; Daniel Ramage; Aaron Segal; Karn Seth

We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner (i.e. without learning each users individual contribution), and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and active adversary settings, and show that security is maintained even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers

Proceedings of the 2007 workshop on Experimental computer science | 2007

RA: ResearchAssistant for the computational sciences

Daniel Ramage; Adam J. Oliner

1.73 x communication expansion for 210 users and 220-dimensional vectors, and 1.98 x expansion for 214 users and 224-dimensional vectors over sending data in the clear.

Genome Research | 2003

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S. Baliga; Jonathan T. Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker

Computational experiments often discard large amounts of valuable data, such as invocation parameters and the lineage of output. Our goal is to identify, manage, capture, and organize this information. These data can be used to make the scientific process simpler and more efficient, and to increase the value of the research by making it more rigorous and reproducible. Research Assistant (RA) is an open source Java programming tool that helps to plug this information leak. RA ensures that all console output is valid XML; saves invocation parameters, the random seed, and code version information; automatically checkpoints intermediate results; creates runnable experiment packages; and keeps meticulous notes. This paper presents the design and implementation of RA, and shows how RA easily scales to make complex experiments repeatable.

international conference on weblogs and social media | 2010