Karin Murthy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karin Murthy is active.

Explore More

Publication

Featured researches published by Karin Murthy.

knowledge discovery and data mining | 2010

DUST: a generalized notion of similarity between uncertain time series

Smruti R. Sarangi; Karin Murthy

Large-scale sensor deployments and an increased use of privacy-preserving transformations have led to an increasing interest in mining uncertain time series data. Traditional distance measures such as Euclidean distance or dynamic time warping are not always effective for analyzing uncertain time series data. Recently, some measures have been proposed to account for uncertainty in time series data. However, we show in this paper that their applicability is limited. In specific, these approaches do not provide an intuitive way to compare two uncertain time series and do not easily accommodate multiple error functions. In this paper, we provide a theoretical framework that generalizes the notion of similarity between uncertain time series. Secondly, we propose DUST, a novel distance measure that accommodates uncertainty and degenerates to the Euclidean distance when the distance is large compared to the error. We provide an extensive experimental validation of our approach for the following applications: classification, top-k motif search, and top-k nearest-neighbor queries.

very large data bases | 2012

Exploiting evidence from unstructured data to enhance master data management

Karin Murthy; Prasad M. Deshpande; Atreyee Dey; Ramanujam Halasipuram; Mukesh K. Mohania; Deepak P; Jennifer S. Reed; Scott Schumacher

Master data management (MDM) integrates data from multiple structured data sources and builds a consolidated 360-degree view of business entities such as customers and products. Todays MDM systems are not prepared to integrate information from unstructured data sources, such as news reports, emails, call-center transcripts, and chat logs. However, those unstructured data sources may contain valuable information about the same entities known to MDM from the structured data sources. Integrating information from unstructured data into MDM is challenging as textual references to existing MDM entities are often incomplete and imprecise and the additional entity information extracted from text should not impact the trustworthiness of MDM data. In this paper, we present an architecture for making MDM text-aware and showcase its implementation as IBM Info-Sphere MDM Extension for Unstructured Text Correlation, an add-on to IBM InfoSphere Master Data Management Standard Edition. We highlight how MDM benefits from additional evidence found in documents when doing entity resolution and relationship discovery. We experimentally demonstrate the feasibility of integrating information from unstructured data sources into MDM.

annual srii global conference | 2012

Configurable and Extensible Multi-flows for Providing Analytics as a Service on the Cloud

Deepak P; Prasad M. Deshpande; Karin Murthy

Compared to traditional analytics deployment models, cloud-based solutions for business analytics provide numerous advantages such as reduction of a large upfront infrastructural cost and the efforts to setup an in-house analytics team. Such advantages of cloud-based service delivery make it particularly attractive for small and medium businesses. In spite of these advantages, analytics penetration has been low particularly in developing regions such as India and China due to many other factors. In this paper, we propose pre-packaged configurable workflows for analytics as a means of endearing cloud-based analytics to customers, with a special focus on small and medium businesses in developing regions. We introduce the concept of configurable multi-flows that make it easy for non-technical personnel to use and customize without being aware of the technical details of the various operators involved in the workflow. Multi-flows comprise of an overlap of multiple possible workflows and are easily extensible to include more variations to support the evolving needs of customers non-disruptively and incrementally. We detail a case-study of the Retail sector where an extensive survey of retail businesses in India revealed that configurable pre-packaged workflows may indeed help improve market penetration. We then identify common analytics needs of retail customers, and detail how such tasks can be expressed as configurable multi-flows. Further, we describe a fully functional implementation of our system that supports configurable multi-flows for analytics. Finally, we illustrate the ease-of-use of configurable multi-flows with the use of multiple screenshots.

web information systems engineering | 2012

Improving recall of regular expressions for information extraction

Karin Murthy; Deepak P; Prasad M. Deshpande

Learning or writing regular expressions to identify instances of a specific concept within text documents with a high precision and recall is challenging. It is relatively easy to improve the precision of an initial regular expression by identifying false positives covered and tweaking the expression to avoid the false positives. However, modifying the expression to improve recall is difficult since false negatives can only be identified by manually analyzing all documents, in the absence of any tools to identify the missing instances. We focus on partially automating the discovery of missing instances by soliciting minimal user feedback. We present a technique to identify good generalizations of a regular expression that have improved recall while retaining high precision. We empirically demonstrate the effectiveness of the proposed technique as compared to existing methods and show results for a variety of tasks such as identification of dates, phone numbers, product names, and course numbers on real world datasets.

network operations and management symposium | 2016

Towards establishing causality between change and incident

Sinem Guven; Karin Murthy; Larisa Shwartz; Amit M. Paradkar

It is common knowledge in the IT service domain that changes to the system configuration are responsible for a major portion of incidents that result in client outages. However, it is typically very difficult to establish a relationship between changes and incidents as proper documentation takes lower priority at change creation time, as well as during incident management, in order to deal with the tremendous time pressure to quickly implement changes and resolve incidents. As a result, it is often not possible to leverage historical data to perform retrospective analysis to identify any emerging trends linking changes to incidents, or to build predictive models for proactive incident prevention at change creation time. In this paper, we present an approach for establishing causality between changes and incidents through an ensemble of statistics, data classification, and natural language processing techniques. We demonstrate our approach with a real world example.

conference on network and service management | 2016

Identifying resources for cloud garbage collection

Zhiming Shen; Christopher C. Young; Sai Zeng; Karin Murthy; Kun Bai

Infrastructure as a Service (IaaS) clouds provide users with the ability to easily and quickly provision servers. A recent study found that one in three data center servers continues to consume resources without producing any useful work. A number of techniques have been proposed to identify such unproductive instances. However, those approaches adopt the strategy to identify idle cloud instances based on resource utilization. Resource utilization as indicator alone could be misleading, which is especially true for enterprise cloud environment. In this paper, we present Pleco, a tool that detects unproductive instances in IaaS clouds. Pleco captures dependency information between users and cloud instances by constructing a weighted reference model based on application knowledge. To handle cases of insufficient application knowledge, Pleco also supplements its dependency results with a machine learning model trained on resource utilization data. Pleco gives a confidence level and justification for each identified unproductive instances. Cloud administrators can then take different actions according to the information provided by Pleco. Pleco is lightweight and requires no modification to existing applications.

Immunotechnology | 2017

COACH: Cognitive analytics for change

Sinem Guven; Pawel Jasionowski; Karin Murthy; Krishna Tunga; George E. Stark

This paper presents our initial efforts towards building a cognitive analytics framework for change management. We propose a novel predictive algorithm for change risk calculation based on historical change failures, server failures, change triggered incidents as well as expert user input. Our predictive algorithm provides significant improvement over traditional risk assessments in proactively capturing problematic changes when tested with real client account data.

Knowledge and Information Systems | 2015

The Mask of ZoRRo: preventing information leakage from documents

Prasad M. Deshpande; Salil Joshi; Prateek Dewan; Karin Murthy; Mukesh K. Mohania; Sheshnarayan Agrawal

In today’s enterprise world, information about business entities such as a customer’s or patient’s name, address, and social security number is often present in both relational databases as well as content repositories. Information about such business entities is generally well protected in databases by well-defined and fine-grained access control. However, current document retrieval systems do not provide user-specific, fine-grained redaction of documents to prevent leakage of information about business entities from documents. Leaving companies with only two choices: either providing complete access to a document, risking potential information leakage, or prohibiting access to the document altogether, accepting potentially negative impact on business processes. In this paper, we present ZoRRo, an add-on for document retrieval systems to dynamically redact sensitive information of business entities referenced in a document based on access control defined for the entities. ZoRRo exploits database systems’ fine-grained, label-based access-control mechanism to identify and redact sensitive information from unstructured text, based on the access privileges of the user viewing it. To make on-the-fly redaction feasible, ZoRRo exploits the concept of

Archive | 2012