Partha Pratim Talukdar

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Partha Pratim Talukdar is active.

Explore More

Publication

Featured researches published by Partha Pratim Talukdar.

european conference on machine learning | 2009

New Regularized Algorithms for Transductive Learning

Partha Pratim Talukdar; Koby Crammer

We propose a new graph-based label propagation algorithm for transductive learning. Each example is associated with a vertex in an undirected graph and a weighted edge between two vertices represents similarity between the two corresponding example. We build on Adsorption, a recently proposed algorithm and analyze its properties. We then state our learning algorithm as a convex optimization problem over multi-label assignments and derive an efficient algorithm to solve this problem. We state the conditions under which our algorithm is guaranteed to converge. We provide experimental evidence on various real-world datasets demonstrating the effectiveness of our algorithm over other algorithms for such problems. We also show that our algorithm can be extended to incorporate additional prior information, and demonstrate it with classifying data where the labels are not mutually exclusive.

empirical methods in natural language processing | 2008

Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks

Partha Pratim Talukdar; Joseph Reisinger; Marius Pasca; Deepak Ravichandran; Rahul Bhagat; Fernando Pereira

We present a graph-based semi-supervised label propagation algorithm for acquiring open-domain labeled classes and their instances from a combination of unstructured and structured text sources. This acquisition method significantly improves coverage compared to a previous set of labeled classes and instances derived from free text, while achieving comparable precision.

conference on computational natural language learning | 2006

A Context Pattern Induction Method for Named Entity Extraction

Partha Pratim Talukdar; Thorsten Brants; Mark Liberman; Fernando Pereira

We present a novel context pattern induction method for information extraction, specifically named entity extraction. Using this method, we extended several classes of seed entity lists into much larger high-precision lists. Using token membership in these extended lists as additional features, we improved the accuracy of a conditional random field-based named entity tagger. In contrast, features derived from the seed lists decreased extractor accuracy.

international conference on management of data | 2008

The ORCHESTRA Collaborative Data Sharing System

Zachary G. Ives; Todd J. Green; Grigoris Karvounarakis; Nicholas E. Taylor; Val Tannen; Partha Pratim Talukdar; Marie Jacob; Fernando Pereira

Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. However, for settings in which structured data is being collaboratively authored by a large community, e.g., in the sciences, there is often a lack of consensus about how it should be represented, what is correct, and which sources are authoritative. Moreover, such data is seldom static: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. In this paper we describe the basic architecture and implementation of the ORCHESTRA system, and summarize some of the open challenges that arise in this setting.

meeting of the association for computational linguistics | 2007

Automatic Code Assignment to Medical Text

Koby Crammer; Mark Dredze; Kuzman Ganchev; Partha Pratim Talukdar; Steven Carroll

Code assignment is important for handling large amounts of electronic medical data in the modern hospital. However, only expert annotators with extensive training can assign codes. We present a system for the assignment of ICD-9-CM clinical codes to free text radiology reports. Our system assigns a code configuration, predicting one or more codes for each document. We combine three coding systems into a single learning system for higher accuracy. We compare our system on a real world medical dataset with both human annotators and other automated systems, achieving nearly the maximum score on the Computational Medicine Centers challenge.

empirical methods in natural language processing | 2014

Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases

Matt Gardner; Partha Pratim Talukdar; Jayant Krishnamurthy; Tom M. Mitchell

Much work in recent years has gone into the construction of large knowledge bases (KBs), such as Freebase, DBPedia, NELL, and YAGO. While these KBs are very large, they are still very incomplete, necessitating the use of inference to fill in gaps. Prior work has shown how to make use of a large text corpus to augment random walk inference over KBs. We present two improvements to the use of such large corpora to augment KB inference. First, we present a new technique for combining KB relations and surface text into a single graph representation that is much more compact than graphs used in prior work. Second, we describe how to incorporate vector space similarity into random walk inference over KBs, reducing the feature sparsity inherent in using surface text. This allows us to combine distributional similarity with symbolic logical inference in novel and effective ways. With experiments on many relations from two separate KBs, we show that our methods significantly outperform prior work on KB inference, both in the size of problem our methods can handle and in the quality of predictions made.

PLOS ONE | 2014

Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses

Leila Wehbe; Brian Murphy; Partha Pratim Talukdar; Alona Fyshe; Aaditya Ramdas; Tom M. Mitchell

Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing and offer new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.

siam international conference on data mining | 2014

FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop

Alex Beutel; Partha Pratim Talukdar; Abhimanu Kumar; Christos Faloutsos; Evangelos E. Papalexakis; Eric P. Xing

Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets. We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an `1 induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that FlexiFaCT converges on the variety of objective functions, even with projections.

international conference on management of data | 2010

Automatically incorporating new sources in keyword search-based data integration

Partha Pratim Talukdar; Zachary G. Ives; Fernando Pereira

Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback [32] to investigate how to automatically discover when a new data source has content relevant to a users view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community [2] and existing schema matchers [11]. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.

conference on information and knowledge management | 2012

Acquiring temporal constraints between relations

Partha Pratim Talukdar; Derry Tanti Wijaya; Tom M. Mitchell

We consider the problem of automatically acquiring knowledge about the typical temporal orderings among relations (e.g., actedIn(person, film) typically occurs before wonPrize (film, award)), given only a database of known facts (relation instances) without time information, and a large document collection. Our approach is based on the conjecture that the narrative order of verb mentions within documents correlates with the temporal order of the relations they represent. We propose a family of algorithms based on this conjecture, utilizing a corpus of 890m dependency parsed sentences to obtain verbs that represent relations of interest, and utilizing Wikipedia documents to gather statistics on narrative order of verb mentions. Our proposed algorithm, GraphOrder, is a novel and scalable graph-based label propagation algorithm that takes transitivity of temporal order into account, as well as these statistics on narrative order of verb mentions. This algorithm achieves as high as 38.4% absolute improvement in F1 over a random baseline. Finally, we demonstrate the utility of this learned general knowledge about typical temporal orderings among relations, by showing that these temporal constraints can be successfully used by a joint inference framework to assign specific temporal scopes to individual facts.

Explore More