Daniel Gruhl | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Gruhl is active.

Explore More

Publication

Featured researches published by Daniel Gruhl.

IEEE Computer | 2007

Steps toward a science of service systems

Jim Spohrer; Paul P. Maglio; John H. Bailey; Daniel Gruhl

The service sector accounts for most of the worlds economic activity, but its the least-studied part of the economy. A service system comprises people and technologies that adaptively compute and adjust to a systems changing value of knowledge. A science of service systems could provide theory and practice around service innovation

international world wide web conferences | 2003

SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

Stephen Dill; Nadav Eiron; David Gibson; Daniel Gruhl; Ramanathan V. Guha; Anant Jhingran; Tapas Kanungo; Sridhar Rajagopalan; Andrew Tomkins; John A. Tomlin; Jason Y. Zien

This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.

knowledge discovery and data mining | 2005

The predictive power of online chatter

Daniel Gruhl; Ramanathan V. Guha; Ravi Kumar; Jasmine Novak; Andrew Tomkins

An increasing fraction of the global discourse is migrating online in the form of blogs, bulletin boards, web pages, wikis, editorials, and a dizzying array of new collaborative technologies. The migration has now proceeded to the point that topics reflecting certain individual products are sufficiently popular to allow targeted online tracking of the ebb and flow of chatter around these topics. Based on an analysis of around half a million sales rank values for 2,340 books over a period of four months, and correlating postings in blogs, media, and web pages, we are able to draw several interesting conclusions.First, carefully hand-crafted queries produce matching postings whose volume predicts sales ranks. Second, these queries can be automatically generated in many cases. And third, even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.

Journal of Web Semantics | 2003

A case for automated large-scale semantic annotation

Stephen Dill; Nadav Eiron; David Gibson; Daniel Gruhl; Ramanathan V. Guha; Anant Jhingran; Tapas Kanungo; Kevin S. McCurley; Sridhar Rajagopalan; Andrew Tomkins; John A. Tomlin; Jason Y. Zien

Abstract This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date. We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large-scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.

international world wide web conferences | 2004

An evaluation of binary xml encoding optimizations for fast stream based xml processing

Roberto J. Bayardo; Daniel Gruhl; Vanja Josifovski; Jussi Petri Myllymaki

This paper provides an objective evaluation of the performance impacts of binary XML encodings, using a fast stream-based XQuery processor as our representative application. Instead of proposing one binary format and comparing it against standard XML parsers, we investigate the individual effects of several binary encoding techniques that are shared by many proposals. Our goal is to provide a deeper understanding of the performance impacts of binary XML encodings in order to clarify the ongoing and often contentious debate over their merits, particularly in the domain of high performance XML stream processing.

information hiding | 1998

Information Hiding to Foil the Casual Counterfeiter

Daniel Gruhl; Walter Bender

Security documents (currency, treasury bills, stocks, bonds, birth certificates, etc.) provide an interesting problem space for investigating information hiding. Recent advances in the quality of consumer printers and scanners have allowed the application of traditional information hiding techniques to printed materials. This paper explores how some of those techniques might be used to address the problem of counterfeiting as the capability of home printers to produce “exact” copies improves.

international world wide web conferences | 2001

Vinci: a service-oriented architecture for rapid development of web applications

Rakesh Agrawal; Roberto J. Bayardo; Daniel Gruhl; Spiros Papadimitriou

Vinci is a local area service-oriented architecture designed for rapid development and management of robust web applications. Based on XML document exchange, Vinci is designed to complement and interoperate with wide area service-oriented architectures such as E-Speak and .NET. This paper presents the Vinci architecture, the rationale behind its design, and an evaluation of its performance. Specically, we show how systems architected with Vinci are developed quickly, scaled eortlessly, and easily moved from prototype to production.

international world wide web conferences | 2002

YouServ: a web-hosting and content sharing tool for the masses

Roberto J. Bayardo; Rakesh Agrawal; Daniel Gruhl; Amit Somani

YouServ is a system that allows its users to pool existing desktop computing resources for high availability web hosting and file sharing. By exploiting standard web and internet protocols (e.g. HTTP and DNS), YouServ does not require those who access YouServ-published content to install special purpose software. Because it requires minimal server-side resources and administration, YouServ can be provided at a very low cost. We describe the design, implementation, and a successful intranet deployment of the YouServ system, and compare it with several alternatives.

ieee international conference on healthcare informatics, imaging and systems biology | 2012

SPOT the Drug! An Unsupervised Pattern Matching Method to Extract Drug Names from Very Large Clinical Corpora

Anni Coden; Daniel Gruhl; Neal Lewis; Joe Terdiman

Although structured electronic health records are becoming more prevalent, much information about patient health is still recorded only in unstructured text. “Understanding” these texts has been a focus of natural language processing (NLP) research for many years, with some remarkable successes, yet there is more work to be done. Knowing the drugs patients take is not only critical for understanding patient health (e.g., for drug-drug interactions or drug-enzyme interaction), but also for secondary uses, such as research on treatment effectiveness. Several drug dictionaries have been curated, such as RxNorm, FDAs Orange Book, or NCI, with a focus on prescription drugs. Developing these dictionaries is a challenge, but even more challenging is keeping these dictionaries up-to-date in the face of a rapidly advancing field-it is critical to identify grapefruit as a “drug” for a patient who takes the prescription medicine Lipitor, due to their known adverse interaction. To discover other, new adverse drug interactions, a large number of patient histories often need to be examined, necessitating not only accurate but also fast algorithms to identify pharmacological substances. In this paper we propose a new algorithm, SPOT, which identifies drug names that can be used as new dictionary entries from a large corpus, where a “drug” is defined as a substance intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease. Measured against a manually annotated reference corpus, we present precision and recall values for SPOT. SPOT is language and syntax independent, can be run efficiently to keep dictionaries up-to-date and to also suggest words and phrases which may be misspellings or uncatalogued synonyms of a known drug. We show how SPOTs lack of reliance on NLP tools makes it robust in analyzing clinical medical text. SPOT is a generalized bootstrapping algorithm, seeded with a known dictionary and automatically extracting the context within which each drug is mentioned. We define three features of such context: support, confidence and prevalence. Finally, we present the performance tradeoffs depending on the thresholds chosen for these features.

very large data bases | 2010

Multimodal social intelligence in a real-time dashboard system

Daniel Gruhl; Meenakshi Nagarajan; Jan Pieper; Christine Robson; Amit P. Sheth

Social Networks provide one of the most rapidly evolving data sets in existence today. Traditional Business Intelligence applications struggle to take advantage of such data sets in a timely manner. The BBC SoundIndex, developed by the authors and others, enabled real-time analytics of music popularity using data from a variety of Social Networks. We present this system as a grounding example of how to overcome the challenges of working with this data from social networks. We discuss a variety of technologies to implement near real-time data analytics to transform Social Intelligence into Business Intelligence and evaluate their effectiveness in the music domain. The SoundIndex project helped to highlight a number of key research areas, including named entity recognition and sentiment analysis in Informal English. It also drew attention to the importance of metadata aggregation in multimodal environments. We explored challenges such as drawing data from a wide set of sources spanning a myriad of modalities, developing adjudication techniques to harmonize inputs, and performing deep analytics on extremely challenging Informal English snippets. Ultimately, we seek to provide guidance on developing applications in a variety of domains that allow an analyst to rapidly grasp the evolution in the social landscape, and show how to validate such a system for a real-world application.

Explore More