Sreeram V. Balakrishnan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sreeram V. Balakrishnan is active.

Explore More

Publication

Featured researches published by Sreeram V. Balakrishnan.

international conference on management of data | 2010

Midas: integrating public financial data

Sreeram V. Balakrishnan; Vivian Chu; Mauricio A. Hernández; Howard Ho; Rajasekar Krishnamurthy; Shixia Liu; Jan Pieper; Jeffrey S. Pierce; Lucian Popa; Christine Robson; Lei Shi; Ioana Stanoi; Edison Lao Ting; Shivakumar Vaithyanathan; Huahai Yang

The primary goal of the Midas project is to build a system that enables easy and scalable integration of unstructured and semi-structured information present across multiple data sources. As a first step in this direction, we have built a system that extracts and integrates information from regulatory filings submitted to the U.S. Securities and Exchange Commission (SEC) and the Federal Deposit Insurance Corporation (FDIC). Midas creates a repository of entities, events, and relationships by extracting, conceptualizing, integrating, and aggregating data from unstructured and semi-structured documents. This repository enables applications to use the extracted and integrated data in a variety of ways including mashups with other public data and complex risk analysis.

inductive logic programming | 2007

Using ILP to construct features for information extraction from semi-structured text

Ganesh Ramakrishnan; Sachindra Joshi; Sreeram V. Balakrishnan; Ashwin Srinivasan

Machine-generated documents containing semistructured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data, methods like SVMs are able to construct good models for information extraction (IE). But how are the feature-definitions to be obtained in the first place? (We are referring here to the representation problem: selecting good features from the ones defined comes later.) So far, features have been defined manually or by using special-purpose programs: neither approach scaling well to handle the heterogeneity of the data or new domain-specific information. We suggest that Inductive Logic Programming (ILP) could assist in this. Specifically, we demonstrate the use of ILP to define features for seven IE tasks using two disparate sources of information. Our findings are as follows: (1) the ILP system is able to identify efficiently large numbers of good features. Typically, the time taken to identify the features is comparable to the time taken to construct the predictive model; and (2) SVM models constructed with these ILP-features are better than the best reported to date that rely heavily on hand-crafted features. For the ILP practioneer, we also present evidence supporting the claim that, for IE tasks, using an ILP system to assist in constructing an extensional representation of text data (in the form of features and their values) is better than using it to construct intensional models for the tasks (in the form of rules for information extraction).

international conference on data engineering | 2006

Automatic Sales Lead Generation from Web Data

Ganesh Ramakrishnan; Sachindra Joshi; Sumit Negi; Raghu Krishnapuram; Sreeram V. Balakrishnan

Speed to market is critical to companies that are driven by sales in a competitive market. The earlier a potential customer can be approached in the decision making process of a purchase, the higher are the chances of converting that prospect into a customer. Traditional methods to identify sales leads such as company surveys and direct marketing are manual, expensive and not scalable. Over the past decade the World Wide Web has grown into an information-mesh, with most important facts being reported through Web sites. Several news papers, press releases, trade journals, business magazines and other related sources are on-line. These sources could be used to identify prospective buyers automatically. In this paper, we present a system called ETAP (Electronic Trigger Alert Program) that extracts trigger events from Web data that help in identifying prospective buyers. Trigger events are events of corporate relevance and indicative of the propensity of companies to purchase new products associated with these events. Examples of trigger events are change in management, revenue growth and mergers & acquisitions. The unstructured nature of information makes the extraction task of trigger events difficult. We pose the problem of trigger events extraction as a classification problem and develop methods for learning trigger event classifiers using existing classification methods. We present methods to automatically generate the training data required to learn the classifiers. We also propose a method of feature abstraction that uses named entity recognition to solve the problem of data sparsity. We score and rank the trigger events extracted from ETAP for easy browsing. Our experiments show the effectiveness of the method and thus establish the feasibility of automatic sales lead generation using the Web data.

international conference on acoustics, speech, and signal processing | 2004

Asynchronous HMM with applications to speech recognition

Ashutosh Garg; Sreeram V. Balakrishnan; Shivakumar Vaithyanathan

We develop a novel formalism for modeling speech signals which are irregularly or incompletely sampled. This situation can arise in real world applications where the speech signal is being transmitted over an error prone channel where parts of the signal can be dropped. Typical speech systems based on hidden Markov models, cannot handle such data since HMMs rely on the assumption that observations are complete and made at regular intervals. We introduce the asynchronous HMM, a variant of the inhomogeneous HMM commonly used in bioinformatics, and show how it can be used to model irregularly or incompletely sampled data. A nested EM algorithm is presented in brief which can be used to learn the parameters of this asynchronous HMM. Evaluation on real world speech data, which has been modified to simulate channel errors, shows that this model and its variants significantly outperform the standard HMM and methods based on data interpolation.

Archive | 2006