Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kris Ganjam is active.

Publication


Featured researches published by Kris Ganjam.


international conference on management of data | 2003

Robust and efficient fuzzy match for online data cleaning

Surajit Chaudhuri; Kris Ganjam; Venkatesh Ganti; Rajeev Motwani

To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation.A significant challenge in such a scenario is to implement an efficient and accurate fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation. In this paper, we propose a new similarity function which overcomes limitations of commonly used similarity functions, and develop an efficient fuzzy match algorithm. We demonstrate the effectiveness of our techniques by evaluating them on real datasets.


international conference on management of data | 2005

Data cleaning in microsoft SQL server 2005

Surajit Chaudhuri; Kris Ganjam; Venkatesh Ganti; Rahul Kapoor; Vivek R. Narasayya; Theo Vassilakis

When collecting and combining data from various sources into a data warehouse, ensuring high data quality and consistency becomes a significant, often expensive, challenge. Common data quality problems include inconsistent data conventions amongst sources such as different abbreviations or synonyms; data entry errors such as spelling mistakes; missing, incomplete, outdated or otherwise incorrect attribute values. These data defects generally manifest themselves as foreign-key mismatches and approximately duplicate records, both of which make further data mining and decision support analyses either impossible or suspect. We demonstrate two new data cleansing operators, Fuzzy Lookup and Fuzzy Grouping, which address these problems in a scalable and domain-independent manner. These operators are implemented within Microsoft SQL Server 2005 Integration Services. Our demo will explain their functionality and highlight multiple real-world scenarios in which they can be used to achieve high data quality.


very large data bases | 2015

SEMA-JOIN: joining semantically-related tables using big table corpora

Yeye He; Kris Ganjam; Xu Chu

Join is a powerful operator that combines records from two or more tables, which is of fundamental importance in the field of relational database. However, traditional join processing mostly relies on string equality comparisons. Given the growing demand for ad-hoc data analysis, we have seen an increasing number of scenarios where the desired join relationship is not equi-join. For example, in a spreadsheet environment, a user may want to join one table with a subject column country-name, with another table with a subject column country-code. Traditional equi-join cannot handle such joins automatically, and the user typically has to manually find an intermediate mapping table in order to perform the desired join. We develop a SEMA-JOIN approach that is a first step toward allowing users to perform semantic join automatically, with a click of the button. Our main idea is to utilize a data-driven method that leverages a big table corpus with over 100 million tables to determine statistical correlation between cell values at both row-level and column-level. We use the intuition that the correct join mapping is the one that maximizes aggregate pairwise correlation, to formulate the join prediction problem as an optimization problem. We develop a linear program relaxation and a rounding argument to obtain a 2-approximation algorithm in polynomial time. Our evaluation using both public tables from the Web and proprietary Enterprise tables from a large company shows that the proposed approach can perform automatic semantic joins with high precision for a variety of common join scenarios.


international conference on management of data | 2018

Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel

Yeye He; Kris Ganjam; Kukjin Lee; Yue Wang; Vivek R. Narasayya; Surajit Chaudhuri; Xu Chu; Yudian Zheng

Business analysts and data scientists today increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, phone number, etc., before they can perform analysis. These ad-hoc transformation problems are typically solved by one-off scripts, which is both difficult and time-consuming. Our observation is that these domain-specific transformation problems have long been solved by developers with code libraries, which are often shared in places like GitHub. We thus develop an extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few (typically 3) input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources. The beta version of TDE was released in Office Store for Excel.


Archive | 2003

Efficient fuzzy match for evaluating data records

Surajit Chaudhuri; Kris Ganjam; Venkatesh Ganti; Rajeev Motwani


international conference on management of data | 2012

InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

Mohamed Yakout; Kris Ganjam; Kaushik Chakrabarti; Surajit Chaudhuri


international world wide web conferences | 2015

Concept Expansion Using Web Tables

Chi Wang; Kaushik Chakrabarti; Yeye He; Kris Ganjam; Zhimin Chen; Philip A. Bernstein


Archive | 2012

Entity augmentation service from latent relational data

Kris Ganjam; Kaushik Chakrabarti; Mohamed Yakout; Surajit Chaudhuri


international conference on management of data | 2008

Incorporating string transformations in record matching

Arvind Arasu; Surajit Chaudhuri; Kris Ganjam; Raghav Kaushik


international conference on management of data | 2015

TEGRA: Table Extraction by Global Record Alignment

Xu Chu; Yeye He; Kaushik Chakrabarti; Kris Ganjam

Collaboration


Dive into the Kris Ganjam's collaboration.

Researchain Logo
Decentralizing Knowledge