Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David C. Kale is active.

Publication


Featured researches published by David C. Kale.


knowledge discovery and data mining | 2015

Deep Computational Phenotyping

Zhengping Che; David C. Kale; Wenzhe Li; Mohammad Taha Bahadori; Yan Liu

We apply deep learning to the problem of discovery and detection of characteristic patterns of physiology in clinical time series data. We propose two novel modifications to standard neural net training that address challenges and exploit properties that are peculiar, if not exclusive, to medical data. First, we examine a general framework for using prior knowledge to regularize parameters in the topmost layers. This framework can leverage priors of any form, ranging from formal ontologies (e.g., ICD9 codes) to data-derived similarity. Second, we describe a scalable procedure for training a collection of neural networks of different sizes but with partially shared architectures. Both of these innovations are well-suited to medical applications, where available data are not yet Internet scale and have many sparse outputs (e.g., rare diagnoses) but which have exploitable structure (e.g., temporal order and relationships between labels). However, both techniques are sufficiently general to be applied to other problems and domains. We demonstrate the empirical efficacy of both techniques on two real-world hospital data sets and show that the resulting neural nets learn interpretable and clinically relevant features.


international health informatics symposium | 2012

Unsupervised pattern discovery in electronic health care data using probabilistic clustering models

Benjamin M. Marlin; David C. Kale; Robinder G. Khemani; Randall C. Wetzel

Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.


international conference on data mining | 2014

An Examination of Multivariate Time Series Hashing with Applications to Health Care

David C. Kale; Dian Gong; Zhengping Che; Yan Liu; Gérard G. Medioni; Randall C. Wetzel; Patrick A. Ross

As large-scale multivariate time series data become increasingly common in application domains, such as health care and traffic analysis, researchers are challenged to build efficient tools to analyze it and provide useful insights. Similarity search, as a basic operator for many machine learning and data mining algorithms, has been extensively studied before, leading to several efficient solutions. However, similarity search for multivariate time series data is intrinsically challenging because (1) there is no conclusive agreement on what is a good similarity metric for multivariate time series data and (2) calculating similarity scores between two time series is often computationally expensive. In this paper, we address this problem by applying a generalized hashing framework, namely kernelized locality sensitive hashing, to accelerate time series similarity search with a series of representative similarity metrics. Experiment results on three large-scale clinical data sets demonstrate the effectiveness of the proposed approach.


Veterinary Surgery | 2013

Hemilaminectomy for thoracolumbar Hansen Type I intervertebral disk disease in ambulatory dogs with or without neurologic deficits: 39 cases (2008–2010)

Elizabeth A. Ingram; David C. Kale; Raviv J. Balfour

OBJECTIVE To describe prognostic factors, outcome, and time to recovery among ambulatory dogs having hemilaminectomy for Hansen Type I intervertebral disk disease. STUDY DESIGN Retrospective case series. ANIMALS Dogs (n = 38; 39 hemilaminectomies). METHODS Medical records (January 2008-May 2010) on all dogs that had hemilaminectomy for Hansen Type I intervertebral disk disease were reviewed. Records for dogs that were ambulatory preoperatively were analyzed for signalment, duration and severity of signs, presence of neurologic deficits, and postoperative outcome. Dogs were categorized based on Frankel score and subcategorized by their level of conscious proprioceptive (CP) deficit. Postoperatively, time to ambulation and to regain normal CP responses was recorded. Results for each group were compared using a χ(2) test and considered significant when P < .05. Recovery times were analyzed using a Cox proportional hazards model. RESULTS Seven dogs were categorized as modified Frankel grade I preoperatively and 32 dogs as grade II with varying levels of deficits (1 of these dogs had previously been operated as grade II and was reoperated again as grade II). Increasing degree of CP deficit preoperatively was significantly correlated with longer time to ambulation (P = .005) as well as longer time to CP normal (P = .01). Duration of signs was not significantly correlated with time to ambulation or neurologic recovery for either grade I or II dogs. CONCLUSIONS Most dogs recovered well with surgical decompression. Increasing degree of deficits preoperatively is significantly correlated with longer recovery time.


international conference on data mining | 2013

Accelerating Active Learning with Transfer Learning

David C. Kale; Yan Liu

Active learning, transfer learning, and related techniques are unified by a core theme: efficient and effective use of available data. Active learning offers scalable solutions for building effective supervised learning models while minimizing annotation effort. Transfer learning utilizes existing labeled data from one task to help learning related tasks for which limited labeled data are available. There has been limited research, however, on how to combine these two techniques. In this paper, we present a simple and principled transfer active learning framework that leverages pre-existing labeled data from related tasks to improve the performance of an active learner. We derive an intuitive bound on generalization error for the classifiers learned by this algorithm that provides insight into the algorithms behavior and the problem in general. Experimental results using several well-known transfer learning data sets confirm our theoretical analysis and demonstrate the effectiveness of our approach.


Respiratory Care | 2014

Algorithms to Estimate PaCO2 and pH Using Noninvasive Parameters for Children with Hypoxemic Respiratory Failure

Robinder G. Khemani; Celikkaya Eb; Shelton Cr; David C. Kale; Patrick A. Ross; Randall C. Wetzel; Christopher J. L. Newth

BACKGROUND: Ventilator management for children with hypoxemic respiratory failure may benefit from ventilator protocols, which rely on blood gases. Accurate noninvasive estimates for pH or PaCO2 could allow frequent ventilator changes to optimize lung-protective ventilation strategies. If these models are highly accurate, they can facilitate the development of closed-loop ventilator systems. We sought to develop and test algorithms for estimating pH and PaCO2 from measures of ventilator support, pulse oximetry, and end-tidal carbon dioxide pressure (PETCO2). We also sought to determine whether surrogates for changes in dead space can improve prediction. METHODS: Algorithms were developed and tested using 2 data sets from previously published investigations. A baseline model estimated pH and PaCO2 from PETCO2 using the previously observed relationship between PETCO2 and PaCO2 or pH (using the Henderson-Hasselbalch equation). We developed a multivariate gaussian process (MGP) model incorporating other available noninvasive measurements. RESULTS: The training data set had 2,386 observations from 274 children, and the testing data set had 658 observations from 83 children. The baseline model predicted PaCO2 within ± 7 mm Hg of the observed PaCO2 80% of the time. The MGP model improved this to ± 6 mm Hg. When the MGP model predicted PaCO2 between 35 and 60 mm Hg, the 80% prediction interval narrowed to ± 5 mm Hg. The baseline model predicted pH within ± 0.07 of the observed pH 80% of the time. The MGP model improved this to ± 0.05. CONCLUSIONS: We have demonstrated a conceptual first step for predictive models that estimate pH and PaCO2 to facilitate clinical decision making for children with lung injury. These models may have some applicability when incorporated in ventilator protocols to encourage practitioners to maintain permissive hypercapnia when using high ventilator support. Refinement with additional data may improve model accuracy.


electronic imaging | 2009

Estimating the position of illuminants in paintings under weak model assumptions: An application to the works of two Baroque masters

David C. Kale; David G. Stork

The problems of estimating the position of an illuminant and the direction of illumination in realist paintings have been addressed using algorithms from computer vision. These algorithms fall into two general categories: In model-independent methods (cast-shadow analysis, occluding-contour analysis, ...), one does not need to know or assume the three-dimensional shapes of the objects in the scene. In model-dependent methods (shape-fromshading, full computer graphics synthesis, ...), one does need to know or assume the three-dimensional shapes. We explore the intermediate- or weak-model condition, where the three-dimensional object rendered is so simple one can very confidently assume its three-dimensional shape and, further, that this shape admits an analytic derivation of the appearance model. Specifically, we can assume that floors and walls are flat and that they are horizontal and vertical, respectively. We derived the maximum-likelihood estimator for the two-dimensional spatial location of a point source in an image as a function of the pattern of brightness (or grayscale value) over such a planar surface. We applied our methods to two paintings of the Baroque, paintings for which the question of the illuminant position is of interest to art historians: Georges de la Tours Christ in the carpenters studio (1645) and Caravaggios The calling of St. Matthew (1599-1600). Our analyses show that a single point source (somewhat near to the depicted candle) is a slightly better explanation of the pattern of brightness on the floor in Christ than are two point sources, one in place of each of the figures. The luminance pattern on the rear wall in The calling implies the source is local, a few meters outside the picture frame-not the infinitely distant sun. Both results are consistent with previous rebuttals of the recent art historical claim that these paintings were executed by means of tracing optically projected images. Our method is the first application of such weak-model methods for inferring the location of illuminants in realist paintings and should find use in other questions in the history of art.


knowledge discovery and data mining | 2017

Collecting and Analyzing Millions of mHealth Data Streams

Tom Quisel; Luca Foschini; Alessio Signorini; David C. Kale

Players across the health ecosystem are initiating studies of thousands, even millions, of participants to gather diverse types of data, including biomedical, behavioral, and lifestyle in order to advance medical research. These efforts to collect multi-modal data sets on large cohorts coincide with the rise of broad activity and behavior tracking across industries, particularly in healthcare and the growing field of mobile health (mHealth). Government and pharmaceutical sponsored, as well as patient-driven group studies in this arena leverage the ability of mobile technology to continuously track behaviors and environmental factors with minimal participant burden. However, the adoption of mHealth has been constrained by the lack of robust solutions for large-scale data collection in free-living conditions and concerns around data quality. In this work, we describe the infrastructure Evidation Health has developed to collect mHealth data from millions of users through hundreds of different mobile devices and apps. Additionally, we provide evidence of the utility of the data for inferring individual traits pertaining to health, wellness, and behavior. To this end, we introduce and evaluate deep neural network models that achieve high prediction performance without requiring any feature engineering when trained directly on the densely sampled multivariate mHealth time series data. We believe that the present work substantiates both the feasibility and the utility of creating a very large mHealth research cohort, as envisioned by the many large cohort studies currently underway across therapeutic areas and conditions.


computer based medical systems | 2011

An informatics architecture for the Virtual Pediatric Intensive Care Unit

Daniel J. Crichton; Chris A. Mattmann; Andrew F. Hart; David C. Kale; Robinder G. Khemani; Patrick A. Ross; Sarah Rubin; Paul Veeravatanayothin; Amy Braverman; Cameron Goodale; Randall C. Wetzel

The Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit (VPICU) is an ambitious research network focused on building online databases for improving decision-making in pediatric intensive care units. Increasingly, there is a need to unify previously distributed and heterogeneous information captured in these databases to support both traditional retrospective support ad-hoc studies, and ad-hoc analyses. VPICU and NASAs Jet Propulsion Laboratory have constructed a reference architecture and implementation framework that addresses these needs. The architecture is unobtrusive, scalable, and secure, with a strong focus on rapid deployment and integration. This paper reports on the current status of our efforts and details the strength of the framework via our recent work in unsupervised discovery of patient similarity within the hospital.


Mobile Health - Sensors, Analytic Methods, and Applications | 2017

Time Series Feature Learning with Applications to Health Care

Zhengping Che; Sanjay Purushotham; David C. Kale; Wenzhe Li; Mohammad Taha Bahadori; Robinder G. Khemani; Yan Liu

Exponential growth in mobile health devices and electronic health records has resulted in a surge of large-scale time series data, which demands effective and fast machine learning models for analysis and discovery. In this chapter, we discuss a novel framework based on deep learning which automatically performs feature learning from heterogeneous time series data. It is well-suited for healthcare applications, where available data have many sparse outputs (e.g., rare diagnoses) and exploitable structures (e.g., temporal order and relationships between labels). Furthermore, we introduce a simple yet effective knowledge-distillation approach to learn an interpretable model while achieving the prediction performance of deep models. We conduct experiments on several real-world datasets and show the empirical efficacy of our framework and the interpretability of the mimic models.

Collaboration


Dive into the David C. Kale's collaboration.

Top Co-Authors

Avatar

Randall C. Wetzel

Children's Hospital Los Angeles

View shared research outputs
Top Co-Authors

Avatar

Yan Liu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Robinder G. Khemani

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Patrick A. Ross

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Mohammad Taha Bahadori

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Zachary C. Lipton

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Zhengping Che

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Wenzhe Li

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Andrew F. Hart

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Aram Galstyan

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge