Svetlozar Nestorov
University of Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Svetlozar Nestorov.
Computing in Science and Engineering | 2012
David Dominic Landis; Jens S. Hummelshøj; Svetlozar Nestorov; Jeffrey Greeley; Marcin Dulak; Thomas Bligaard; Jens K. Nørskov; Karsten Wedel Jacobsen
The possibilities for designing new materials based on quantum physics calculations are rapidly growing, but these design efforts lead to a significant increase in the amount of computational data created. The Computational Materials Repository (CMR) addresses this data challenge and provides a software infrastructure that supports the collection, storage, retrieval, analysis, and sharing of data produced by many electronic-structure simulators.
decision support systems | 2006
Nenad Jukic; Svetlozar Nestorov
Data warehouses store data that explicitly and implicitly reflect customer patterns and trends, financial and business practices, strategies, know-how, and other valuable managerial information. In this paper, we suggest a novel way of acquiring more knowledge from corporate data warehouses. Association-rule mining, which captures co-occurrence patterns within data, has attracted considerable efforts from data warehousing researchers and practitioners alike. In this paper, we present a new data-mining method called qualified association rules. Qualified association rules capture correlations across the entire data warehouse, not just over an extracted and transformed portion of the data that is required when a standard data-mining tool is used.
hawaii international conference on system sciences | 2003
Svetlozar Nestorov; Nenad Jukic
Many organizations often underutilize their existing data warehouses. In this paper, we suggest a way of acquiring more information from corporate data warehouses without the complications and drawbacks of deploying additional software systems. Association-rule mining, which captures co-occurrence patterns within data, has attracted considerable efforts from data warehousing researchers and practitioners alike. Unfortunately, most data mining tools are loosely coupled, at best, with the data warehouse repository. Furthermore, these tools can often find association rules only within the main fact table of the data warehouse (thus ignoring the information-rich dimensions of the star schema) and are not easily applied on non-transaction level data often found in data warehouses. In this paper, we present a new data-mining framework that is tightly integrated with the data warehousing technology. Our framework has several advantages over the use of separate data mining tools. First, the data stays at the data warehouse, and thus the management of security and privacy issues is greatly reduced. Second, we utilize the query processing power of a data warehouse itself, without using a separate data-mining tool. In addition, this framework allows ad-hoc data mining queries over the whole data warehouse, not just over a transformed portion of the data that is required when a standard data-mining tool is used. Finally, this framework also expands the domain of association-rule mining from transaction-level data to aggregated data as well.
ieee international conference on cloud computing technology and science | 2011
Gabriela Turcu; Ian T. Foster; Svetlozar Nestorov
Text analysis tools are nowadays required to process increasingly large corpora which are often organized as small files (abstracts, news articles, etc.). Cloud computing offers a convenient, on-demand, pay-as-you-go computing environment for solving such problems. We investigate provisioning on the Amazon EC2 cloud from the user perspective, attempting to provide a scheduling strategy that is both timely and cost effective. We derive an execution plan using an empirically determined application performance model. A first goal of our performance measurements is to determine an optimal file size for our application to consume. Using the subset-sum first fit heuristic we reshape the input data by merging files in order to match as closely as possible the desired file size. This also speeds up the task of retrieving the results of our application, by having the output be less segmented. Using predictions of the performance of our application based on measurements on small data sets, we devise an execution plan that meets a user specified deadline while minimizing cost.
PLOS Computational Biology | 2014
David R. Blair; Kanix Wang; Svetlozar Nestorov; James A. Evans; Andrey Rzhetsky
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies.
data warehousing and knowledge discovery | 2008
Gabriela Turcu; Svetlozar Nestorov; Ian T. Foster
In the data driven field of bioinformatics, data warehouses have emerged as common solutions to facilitate data analysis. The uncertainty, complexity and change rate of biological data underscore the importance of capturing its evolution. To capture information about our databases evolution, we incorporate a temporal dimension in our data model, which we implement by means of lifespan timestamps attached to every tuple in the warehouse. This temporal information allows us to keep a full history of the warehouse and recreate any past version for purposes of auditing. Equally importantly, this information facilitates the incremental maintenance of the warehouse. We maintain the warehouse incrementally not only for relations derived by applying the standard relational operators but also for computed relations. In particular, we consider computed relations obtained through external BLAST sequence alignment computations, which are often identified as a bottleneck in the integrated warehouse maintenance process. Our experiments with subsets of protein sequences from the NCBI non-redundant database demonstrate at least 10-fold speedups for realistic target space size increases of 1% to 5%.
Journal of Database Management | 2005
Nenad Jukic; Svetlozar Nestorov; Susan V. Vrbsky; Allen S. Parrish
This paper presents an extension to a Multi-Level Secure (MLS) data model that requires the classification of data and users into multiple security levels. In MLS systems, cover stories allow information provided to users at lower security levels to differ from information provided to users at higher security levels. Previous versions of the MLS model did not permit cover stories for key attributes, because the key is used to relate the various cover stories for a particular entity. We have extended the MLS model to include non-key related cover stories so that key attributes can also have different values at different security levels. In this paper we describe the necessary model changes and modifications to the relational algebra, which are required to implement non-key related cover stories. We demonstrate the improvements made by these changes and discuss the implementation and performance of a system based on the described concepts.
international conference on management of data | 2003
Nenad Jukic; Svetlozar Nestorov; Susan V. Vrbsky
There has been an abundance of research within the last couple of decades in the area of multilevel secure (MLS) databases. Recent work in this field deals with the processing of multilevel transactions, expanding the logic of MLS query languages, and utilizing MLS principles within the realm of E-Business. However, there is a basic flaw within the MLS logic, which obstructs the handling of clearance-invariant aggregate queries and physical-entity related queries where some of the information in the database may be gleaned from the outside world. This flaw stands in the way of a more pervasive adoption of MLS models by the developers of practical applications. This paper clearly identifies the cause of this impediment -- the cover story dependence on the value of a user-defined key -- and proposes a practical solution.
International Journal of Business Intelligence Research | 2011
Nenad Jukic; Svetlozar Nestorov; Miguel Velasco; Jami Eddington
Association rules mining is one of the most successfully applied data mining methods in today’s business settings (e.g. Amazon or Netflix recommendations to customers). Qualified association rules mining is an extension of the association rules data mining method, that uncovers previously unknown correlations that only manifest themselves under certain circumstances (e.g. on a particular day of the week), with the goal of improving action results, e.g. turning an underperforming campaign (spread too thin over the entire audience) into a highly targeted campaign that delivers results. Such correlations have not been easily reachable using standard data mining tools so far. This paper describes the method for straightforward discovery of qualified association rules and demonstrates the use of qualified association rules mining on an actual corporate data set. The data set is a subset of a corporate data warehouse for Sam’s Club, a division of Wal-Mart Stores, INC. The experiments described in this paper illustrate how qualified association rules supplement standard association rules data mining methods and provide additional information which can be used to better target corporate actions.
web age information management | 2007
Svetlozar Nestorov; Chuang Liu; Ian T. Foster
In this paper, we consider relational queries involving one or more constraints over the sum of multiple attributes (sum constraint queries). We develop rewriting techniques to transform a sum constraint query in order to enable its efficient processing by conventional relational database engines. We also consider the problem of producing partial results for sum constraint queries. We propose a framework for ranking tuples in a relation according to their likelihood of contributing to a tuple in the result of a sum constraint query. Sorting tuples using this framework provides an alternative to traditional sorting based on single attribute value.