Darya Filippova | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Darya Filippova is active.

Explore More

Publication

Featured researches published by Darya Filippova.

Algorithms for Molecular Biology | 2014

Identification of alternative topological domains in chromatin

Darya Filippova; Robert Patro; Geet Duggal; Carl Kingsford

Chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The ensemble of domains we identify allows us to quantify the degree to which the domain structure is hierarchical as opposed to overlapping, and our analysis reveals a pronounced hierarchical structure in which larger stable domains tend to completely contain smaller domains. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone marks at the boundaries.

Transportation Research Record | 2009

Visual Analytics for Transportation Incident Data Sets

Krist Wongsuphasawat; Michael L. Pack; Darya Filippova; Michael VanDaniker; Andreea Olea

Transportation systems are being monitored at an unprecedented scope, which is resulting in tremendously detailed traffic and incident databases. Although the transportation community emphasizes developing standards for storing these incident data, little effort has been made to design appropriate visual analytics tools to explore the data, extract meaningful knowledge, and represent results. Analyzing these large multivariate geospatial data sets is a nontrivial task. A novel, web-based, visual analytics tool called Fervor is proposed as an application that affords sophisticated, yet user-friendly, analysis of transportation incident data sets. Interactive maps, histograms, two-dimensional plots, and parallel coordinates plots are four featured visualizations that are integrated to allow users to interact simultaneously with and see relationships among multiple visualizations. Using a rich set of filters, users can create custom conditions to filter data and focus on a smaller data set. However, because of the multivariate nature of the data, finding interesting relationships can be a time-consuming task. Therefore, a rank-by-feature framework has been adopted and further expanded to quantify the strength of relationships among the different fields describing the data. In this paper, transportation incident data collected by the Maryland State Highway Administrations CHART program are used; however, the tool can be easily modified to accept other transportation data sets.

information reuse and integration | 2009

ICE--visual analytics for transportation incident datasets

Michael L. Pack; Krist Wongsuphasawat; Michael VanDaniker; Darya Filippova

Transportation systems are being monitored at an unprecedented scope resulting in tremendously detailed traffic and incident databases. While the transportation community emphasizes developing standards for storing this incident data, little effort has been made to design appropriate visual analytics tools to explore the data, extract meaningful knowledge, and represent results. Analyzing these large multivariate geospatial datasets is a non-trivial task. A novel, web-based, visual analytics tool called ICE (Incident Cluster Explorer) is proposed as an application that affords sophisticated yet user-friendly analysis of transportation incident datasets. Interactive maps, histograms, two-dimensional plots and parallel coordinates plots are four visualizations that are integrated together to allow users to simultaneously interact with and see relationships between multiple visualizations. Accompanied by a rich set of filters, users can create custom conditions to filter data and focus on a smaller dataset. Due to the multivariate nature of the data, a rank-by-feature framework has been expanded to quantify the strength of relationships between the different fields.

knowledge discovery and data mining | 2012

The missing models: a data-driven approach for learning how networks grow

Robert Patro; Geet Duggal; Emre Sefer; Hao Wang; Darya Filippova; Carl Kingsford

Probabilistic models of network growth have been extensively studied as idealized representations of network evolution. Models, such as the Kronecker model, duplication-based models, and preferential attachment models, have been used for tasks such as representing null models, detecting anomalies, algorithm testing, and developing an understanding of various mechanistic growth processes. However, developing a new growth model to fit observed properties of a network is a difficult task, and as new networks are studied, new models must constantly be developed. Here, we present a framework, called GrowCode, for the automatic discovery of novel growth models that match user-specified topological features in undirected graphs. GrowCode introduces a set of basic commands that are general enough to encode several previously developed models. Coupling this formal representation with an optimization approach, we show that GrowCode is able to discover models for protein interaction networks, autonomous systems networks, and scientific collaboration networks that closely match properties such as the degree distribution, the clustering coefficient, and assortativity that are observed in real networks of these classes. Additional tests on simulated networks show that the models learned by GrowCode generate distributions of graphs with similar variance as existing models for these classes.

BMC Bioinformatics | 2012

Coral: an integrated suite of visualizations for comparing clusterings

Darya Filippova; Aashish Gadani; Carl Kingsford

BackgroundClustering has become a standard analysis for many types of biological data (e.g interaction networks, gene expression, metagenomic abundance). In practice, it is possible to obtain a large number of contradictory clusterings by varying which clustering algorithm is used, which data attributes are considered, how algorithmic parameters are set, and which near-optimal clusterings are chosen. It is a difficult task to sift though such a large collection of varied clusterings to determine which clustering features are affected by parameter settings or are artifacts of particular algorithms and which represent meaningful patterns. Knowing which items are often clustered together helps to improve our understanding of the underlying data and to increase our confidence about generated modules.ResultsWe present Coral, an application for interactive exploration of large ensembles of clusterings. Coral makes all-to-all clustering comparison easy, supports exploration of individual clusterings, allows tracking modules across clusterings, and supports identification of core and peripheral items in modules. We discuss how each visual component in Coral tackles a specific question related to clustering comparison and provide examples of their use. We also show how Coral could be used to visually and quantitatively compare clusterings with a ground truth clustering.ConclusionAs a case study, we compare clusterings of a recently published protein interaction network of Arabidopsis thaliana. We use several popular algorithms to generate the network’s clusterings. We find that the clusterings vary significantly and that few proteins are consistently co-clustered in all clusterings. This is evidence that several clusterings should typically be considered when evaluating modules of genes, proteins, or sequences, and Coral can be used to perform a comprehensive analysis of these clustering ensembles.

Algorithms for Molecular Biology | 2013

Resolving spatial inconsistencies in chromosome conformation measurements

Geet Duggal; Robert Patro; Emre Sefer; Hao Wang; Darya Filippova; Samir Khuller; Carl Kingsford

BackgroundChromosome structure is closely related to its function and Chromosome Conformation Capture (3C) is a widely used technique for exploring spatial properties of chromosomes. 3C interaction frequencies are usually associated with spatial distances. However, the raw data from 3C experiments is an aggregation of interactions from many cells, and the spatial distances of any given interaction are uncertain.ResultsWe introduce a new method for filtering 3C interactions that selects subsets of interactions that obey metric constraints of various strictness. We demonstrate that, although the problem is computationally hard, near-optimal results are often attainable in practice using well-designed heuristics and approximation algorithms. Further, we show that, compared with a standard technique, this metric filtering approach leads to (a) subgraphs with higher statistical significance, (b) lower embedding error, (c) lower sensitivity to initial conditions of the embedding algorithm, and (d) structures with better agreement with light microscopy measurements. Our filtering scheme is applicable for a strict frequency-to-distance mapping and a more relaxed mapping from frequency to a range of distances.ConclusionsOur filtering method for 3C data considers both metric consistency and statistical confidence simultaneously resulting in lower-error embeddings that are biologically more plausible.

workshop on algorithms in bioinformatics | 2013

Multiscale Identification of Topological Domains in Chromatin

Darya Filippova; Robert Patro; Geet Duggal; Carl Kingsford

Recent chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types, are conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone modifications at the boundaries.

privacy security risk and trust | 2012

Dynamic Exploration of Recording Sessions between Jazz Musicians over Time

Darya Filippova; Michael Fitzgerald; Carl Kingsford; Fernando Benadon

We present a new system for exploring, in an intuitive and interactive way, a large compendium of data about collaborations between jazz musicians. The system consists of an easy-to-use web application that marries an ego-network view of collaborations with an interactive timeline. We develop a new measure of collaboration strength that is used to highlight strong and weak collaborations in the network view. The ego-network is arranged using a novel algorithm for ordering nodes that avoids occlusion even when the network is frequently changing. Finally, the system is applied to a large, unique, hand-curated data set of recorded jazz collaborations. The system can be accessed at http://mapofjazz.com/socialcom.

research in computational molecular biology | 2017

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

David Pellow; Darya Filippova; Carl Kingsford

Using a sequence’s \(k\)-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. Since \(k\)-mer sets often reach hundreds of millions of elements, traditional data structures are impractical for \(k\)-mer set storage, and Bloom filters and their variants are used instead. Bloom filters reduce the memory footprint required to store millions of \(k\)-mers while allowing for fast set containment queries, at the cost of a low false positive rate. We show that, because \(k\)-mers are derived from sequencing reads, the information about \(k\)-mer overlap in the original sequence can be used to reduce the false positive rate up to \(30{\times }\) with little or no additional memory and with set containment queries that are only 1.3–1.6 times slower. Alternatively, we can leverage \(k\)-mer overlap information to store \(k\)-mer sets in about half the space while maintaining the original false positive rate. We consider several variants of such \(k\)-mer Bloom filters (kBF), derive theoretical upper bounds for their false positive rate, and discuss their range of applications and limitations. We provide a reference implementation of kBF at https://github.com/Kingsford-Group/kbf/.

workshop on algorithms in bioinformatics | 2012

Resolving spatial inconsistencies in chromosome conformation data

Geet Duggal; Robert Patro; Emre Sefer; Hao Wang; Darya Filippova; Samir Khuller; Carl Kingsford

We introduce a new method for filtering noisy 3C interactions that selects subsets of interactions that obey metric constraints of various strictness. We demonstrate that, although the problem is computationally hard, near-optimal results are often attainable in practice using well-designed heuristics and approximation algorithms. Further, we show that, compared with a standard technique, this metric filtering approach leads to (a) subgraphs with higher total statistical significance, (b) lower embedding error, (c) lower sensitivity to initial conditions of the embedding algorithm, and (d) structures with better agreement with light microscopy measurements.

Explore More