Alexander Rasin
DePaul University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexander Rasin.
extending database technology | 2013
Alexander Rasin; Stanley B. Zdonik
Good database design is typically a very difficult and costly process. As database systems get more complex and as the amount of data under management grows, the stakes increase accordingly. Past research produced a number of design tools capable of automatically selecting secondary indexes and materialized views for a known workload. However, a significant bulk of research on automated database design has been done in the context of row-store DBMSes. While this work has produced effective design tools, new specialized database architectures demand a rethinking of automated design algorithms. In this paper, we present results for an automatic design tool that is aimed at column-oriented DBMSes on OLAP workloads. In particular, we have chosen a commercial column store DBMS that supports data sorting. In this setting, the key problem is selecting proper sort orders and compression schemes for the columns as well as appropriate pre-join views. This paper describes our automatic design algorithms as well as the results of some experiments using it on realistic data sets.
ieee international conference on requirements engineering | 2014
Piotr Pruski; Sugandha Lohar; Rundale Aquanette; Greg Ott; Sorawit Amornborvornwong; Alexander Rasin; Jane Cleland-Huang
One of the surprising observations of traceability in practice is the under-utilization of existing trace links. Organizations often create links in order to meet compliance requirements, but then fail to capitalize on the potential benefits of those links to provide support for activities such as impact analysis, test regression selection, and coverage analysis. One of the major adoption barriers is caused by the lack of accessibility to the underlying trace data and the lack of skills many project stakeholders have for formulating complex trace queries. To address these challenges we introduce TiQi, a natural language approach, which allows users to write or speak trace queries in their own words. TiQi includes a vocabulary and associated grammar learned from analyzing NL queries collected from trace practitioners. It is evaluated against trace queries gathered from trace practitioners for two different project environments.
extending database technology | 2012
Hideaki Kimura; Carleton Coffrin; Alexander Rasin; Stanley B. Zdonik
Many database applications deploy hundreds or thousands of indexes to speed up query execution. Despite a plethora of prior work on index selection, designing and deploying indexes remains a difficult task for database administrators. First, real-world businesses often require online index deployment, and the traditional off-line approach to index selection ignores intermediate workload performance during index deployment. Second, recent work on on-line index selection does not address effects of complex interactions that manifest during index deployment. In this paper, we propose a new approach that incorporates transitional design performance into the overall problem of physical database design. We call our approach Incremental Database Design. As the first step in this direction, we study the problem of ordering index deployment. The benefits of a good index deployment order are twofold: (1) a prompt query runtime improvement and (2) a reduced total time to deploy the design. Finding an effective deployment order is difficult due to complex index interaction and a factorial number of possible solutions. We formulate a mathematical model to represent the index ordering problem and demonstrate that Constraint Programming (CP) is a more efficient solution compared to other methods such as mixed integer programming and A * search. In addition to exact search techniques, we also study local search algorithms that make significant improvements over a greedy solution with minimal computational overhead. Our empirical analysis using the TPC-H dataset shows that our pruning techniques can reduce the size of the search space by many orders of magnitude. Using the TPC-DS dataset, we verify that our local search algorithm is a highly scalable and stable method for quickly finding the best known solutions.
requirements engineering foundation for software quality | 2016
Sugandha Lohar; Jane Cleland-Huang; Alexander Rasin
[Context and Motivation:] In current practice, existing traceability data is often underutilized due to lack of accessibility and difficulties users have in constructing the complex SQL queries needed to address realistic Software Engineering questions. In our prior work we therefore presented TiQi --- a natural language NL interface for querying software projects. TiQi has been shown to transform a set of trace queries collected from IT experts at accuracy rates ranging from 47i¾?% to 93i¾?%. [Question/problem:] However, users need to quickly determine whether TiQi has correctly understood the NL query. [Principal ideas/results:] TiQi needs to communicate the transformed query back to the user and provide support for disambiguation and correction. In this paper we report on three studies we conducted to compare the effectiveness of four query representation techniques. [Contribution:] We show that simultaneously displaying a visual query representation, SQL, and a sample of the data results enabled users to most accurately evaluate the correctness of the transformed query.
Computers in Biology and Medicine | 2015
Jose R. Zamacona; Ronald Niehaus; Alexander Rasin; Jacob D. Furst; Daniela Stan Raicu
Computer-aided diagnosis systems can play an important role in lowering the workload of clinical radiologists and reducing costs by automatically analyzing vast amounts of image data and providing meaningful and timely insights during the decision making process. In this paper, we present strategies on how to better manage the limited time of clinical radiologists in conjunction with predictive model diagnosis. We first introduce a metric for discriminating between the different categories of diagnostic complexity (such as easy versus hard) encountered when interpreting CT scans. Second, we propose to learn the diagnostic complexity using a classification approach based on low-level image features automatically extracted from pixel data. We then show how this classification can be used to decide how to best allocate additional radiologists to interpret a case based on its diagnosis category. Using a lung nodule image dataset, we determined that, by a simple division of cases into hard and easy to diagnose, the number of interpretations can be distributed to significantly lower the cost with limited loss in prediction accuracy. Furthermore, we show that with just a few low-level image features (18% of the original set) we are able to determine the easy from hard cases for a significant subset (66%) of the lung nodule image data.
statistical and scientific database management | 2017
James Wagner; Alexander Rasin; Dai Hai Ton That; Tanu Malik
RDBMSes only support one clustered index per database table that can speed up query processing. Database applications, that continually ingest large amounts of data, perceive slow query response times to long downtimes, as the clustered index ordering must be strictly maintained. In this paper, we show that application slowdown or downtime, however, can often be avoided if database systems expose the physical location of attributes that are completely or approximately clustered. Towards this, we propose PLI, a physical location index, constructed by determining the physical ordering of an attribute and creating approximately sorted buckets that map physical ordering with attribute values in a live database. To use a PLI incoming SQL queries are simply rewritten with physical ordering information for that particular database. Experiments show queries with the PLI index significantly outperform queries using native unclustered (secondary) indexes, while the index itself requires a much lower maintenance overheads when compared to native clustered indexes.
automated software engineering | 2017
Jinfeng Lin; Yalin Liu; Jin Guo; Jane Cleland-Huang; William Goss; Wenchuang Liu; Sugandha Lohar; Natawut Monaikul; Alexander Rasin
Software projects produce large quantities of data such as feature requests, requirements, design artifacts, source code, tests, safety cases, release plans, and bug reports. If leveraged effectively, this data can be used to provide project intelligence that supports diverse software engineering activities such as release planning, impact analysis, and software analytics. However, project stakeholders often lack skills to formulate complex queries needed to retrieve, manipulate, and display the data in meaningful ways. To address these challenges we introduce TiQi, a natural language interface, which allows users to express software-related queries verbally or written in natural language. TiQi is a web-based tool. It visualizes available project data as a prompt to the user, accepts Natural Language (NL) queries, transforms those queries into SQL, and then executes the queries against a centralized or distributed database. Raw data is stored either directly in the database or retrieved dynamically at runtime from case tools and repositories such as Github and Jira. The transformed query is visualized back to the user as SQL and augmented UML, and raw data results are returned. Our tool demo can be found on YouTube at the following link:http://tinyurl.com/TIQIDemo.
international conference on data mining | 2014
Mike Seidel; Alexander Rasin; Jacob D. Furst; Daniela Stan Raicu
The workload associated with the daily job of a clinical radiologist has been steadily increasing as the volume of the archived and the newly acquired images grows. Computer-aided diagnostic systems are becoming an indispensable tool in automating image analysis and providing preliminary diagnosis that can help guide radiologists decisions. In this paper, we introduce a novel metric to evaluate the difficulty of reaching diagnostic consensus when interpreting a case and illustrate several benefits that such insight can provide. Using a lung nodule image dataset, we demonstrate how a metric-based case partitioning can be used to better select how many radiologists are assigned to each case and how to identify image features that provide important feedback to further assist with the diagnosis. This knowledge can also be leveraged to shed 25% of radiologist annotations without any loss in predictive accuracy.
international conference on data mining | 2013
Jose R. Zamacona; Alexander Rasin; Jacob D. Furst; Daniela Stan Raicu
The problem of classifying samples for which there is no definite label is a challenging one in which multiple annotators will provide a more certain input for a classifier. Unlike most of active learning scenarios that require identifying which images to be annotated, we explore how many annotations can potentially be used per instance (one annotation per instance is only the initial step) and propose a threshold-based concept of estimated instance difficulty to guide the custom label acquisition strategy. Using a lung nodule image data set, we determined that, by a simple division of cases into easy and hard to classify, the number of annotations can be distributed to significantly lower the cost (number of acquired annotations) for building a reliable classifier. We show the entire range of available tradeoffs-from a small reduction in annotation cost with no perceptible accuracy loss to a large reduction in annotation cost with a minimal sacrifice of classification accuracy.
international provenance and annotation workshop | 2018
Alexander Rasin; Tanu Malik; James Wagner; Caleb Kim
Where provenance is a relationship between a data item and the location from which this data was copied. In a DBMS, a typical use of where provenance is in establishing a copy-by-address relationship between the output of a query and the particular data value(s) that originated it. Normal DBMS operations create a variety of auxiliary copies of the data (e.g., indexes, MVs, cached copies). These copies exist over time with relationships that evolve continuously – (A) indexes maintain the copy with a reference to the origin value, (B) MVs maintain the copy without a reference to the source table, (C) cached copies are created once and are never maintained. A query may be answered from any of these auxiliary copies; however, this where provenance is not computed or maintained. In this paper, we describe sources from which forensic analysis of storage can derive where provenance of table data. We also argue that this computed where provenance can be useful (and perhaps necessary) for accurate forensic reports and evidence from maliciously altered databases or validation of corrupted DBMS storage.