Michael N. Gubanov
University of Texas at San Antonio
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael N. Gubanov.
international conference on data engineering | 2008
Michael N. Gubanov; Philip A. Bernstein; Alexander Moshchuk
Model management is a high-level programming language designed to efficiently manipulate schemas and mappings. It is comprised of robust operators that combined in short programs can solve complex metadata-oriented problems in a compact way. For instance, countless enterprise data integration scenarios can be easily expressed in this high-level language thus saving hundreds of development man-hours. Here we present the first model management engine that has reverse-engineering support for data integration, which is one of the most pressing metadata-oriented problems. It merges two schemas based on the mappings between them and allows user to correct the result keeping all the mappings in sync automatically. For user it is much more convenient than determining which mappings to correct in order to get desired result. In addition, the engine supports restructuring merging which is important when the sources are structured differently and cannot be mapped directly. While making schema merging fully automatic is not yet possible, our work simplifies and automates this process to make it practical in complex data integration scenarios.
information reuse and integration | 2011
Michael N. Gubanov; Linda G. Shapiro; Anna Pyayt
For at least a decade, WWW, large enterprises, and desktop users suffer from an inability to efficiently access and manage their data. To help automate different aspects of this challenging problem many solutions have been proposed in both academic and industrial research. One of them - UFO Repository, introduced in [5] is currently gaining momentum by advocating an object-oriented approach to help manage information overflow. An overview and evaluation of the UFO approach was published and reviewed in [5, 1]. This paper is more focused on the algorithms for UFO creation and knowledge accumulation. We describe, evaluate these algorithms, and demonstrate that UFO learning performance is surprisingly fast and accurate across several domains even for a small amount of initial training data.
international conference on data engineering | 2017
Shangyu Luo; Zekai J. Gao; Michael N. Gubanov; Luis Leopoldo Perez; Christopher Jermaine
As data analytics has become an important application for modern data management systems, a new category of data management system has appeared recently: the scalable linear algebra system. In this paper, we argue that a parallel or distributed database system is actually an excellent platform upon which to build such functionality. Most relational systems already have support for cost-based optimization—which is vital to scaling linear algebra computations—and it is well-known how to make relational systems scale. We show that by making just a few changes to a parallel/ distributed relational database system, such a system can be a competitive platform for scalable linear algebra. Taken together, our results should at least raise the possibility that brand new systems designed from the ground up to support scalable linear algebra are not absolutely necessary, and that such systems could instead be built on top of existing relational technology. Our results also suggest that if scalable linear algebra is to be added to a modern dataflow platform such as Spark, they should be added on top of the systems more structured (relational) data abstractions, rather than being constructed directly on top of the systems raw dataflow operators.
bioinformatics and biomedicine | 2011
Michael N. Gubanov; Linda G. Shapiro
In this paper we discuss automatic pre-diagnostics of Alzheimers Disease using a new object-oriented data integration technology — UFO (Unified Famous Objects). UFO was originally introduced in [3] to simplify access to heterogeneous data in federated data sources.
business intelligence for the real-time enterprises | 2008
Bogdan Alexe; Michael N. Gubanov; Mauricio A. Hernández; C. T. Howard Ho; Jen Wei Huang; Yannis Katsis; Lucian Popa; Barna Saha; Ioana Stanoi
The Clio project at IBM Almaden investigates foundational aspects of data transformation, with particular emphasis on the design and execution of schema mappings. We now use Clio as part of a broader data-flow framework in which mappings are just one component. These data-flows express complex transformations between several source and target schemas and require multiple mappings to be specified. This paper describes research issues we have encountered as we try to create and run these mapping-based data-flows. In particular, we describe how we use Unified Famous Objects (UFOs), a schema abstraction similar to business objects, as our data model, how we reason about flows of mappings over UFOs, and how we create and deploy transformations into different run-time engines.
information reuse and integration | 2011
Michael N. Gubanov; Anna Pyayt; Linda G. Shapiro
While the problem to find needed information on the Web is being solved by the major search engines, access to the information in large text documents (e-books, conference proceedings, product manuals, etc) is still very rudimentary. Thus, keyword-search is often the only way to find the needle in the haystack. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces compared to keyword-search. Here we describe a new hybrid document browser that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs [23] yields more relevant search results and improves user experience.
information reuse and integration | 2013
Surya Cheemalapati; Michael N. Gubanov; Michael Del Vale; Anna Pyayt
Military personnel, airplane pilots, and bus drivers often operate in stressful conditions when something unexpected can happen and cause dangerous consequences if they do not respond properly. Additionally, stress adversely affects human decision making abilities, therefore prompt, preferably real-time detection of fear is very important. Based on previous studies for non-portable multi-electrode electroencephalography (EEG) systems the ratio of the power of the slow waves to that of the fast waves increases when a person is relaxed and decreases when s/he is scared. In this study we test small portable EEG and develop algorithms for real time detection of the stressful condition - fear. During the experiment we compare EEG signals of subjects in relaxed state with those in stressed state while they are watching a scene from a scary movie. The ratio of the slow/fast wave powers was measured and the observed pattern was similar to one obtained using a multi-electrode system. We integrate stream-processing algorithms in the system to ensure real-time detection of any changes in mental condition and timely generate the alarm event.
information reuse and integration | 2012
Michael N. Gubanov; Anna Pyayt
Large scale text mining research, informally called Big text is a crucial part of Big data agenda that recently started gaining momentum [24]. It targets new technologies to manage large amounts of unstructured textual data in order to quickly find an retrieve needed information. In medical domain fast access to information is especially important. Keyword-search, a de-facto standard to search over Electronic Health Records (EHR), being simple and therefore popular technique, however, is not ideal and often returns either too many irrelevant or too few relevant search results. Clinicians, usually very short on time, just cannot afford trial and error of keyword-search and therefore do not use all information available in patient records. Next generation patient care requires more efficient access to valuable information hidden in patient histories represented by millions of patient records. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces to unstructured data compared to keyword-search. Here we describe a new hybrid browser specifically for EHR that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs [25] yields more relevant search results and improves user experience.
international conference on data engineering | 2017
Michael N. Gubanov
The big data era brought us petabytes of data together with the challenges of storing and efficiently accessing largescale datasets. However, it unexpectedly surprised everyone with an enormous variety of data sources and types, and corresponding different data models. Dealing with a variety of those data models turned out to be a “hard nut to crack” for almost all existing data management engines. Data integration, a mature field addressing problems of accessing and fusing data residing in more than one datasource, over the years came up with feasible semi-automatic solutions, most of which efficiently handle a handful of data sources represented in one or two different data models (e.g. relational and semistructured) [Haas et al., 2005], [Gubanov et al., 2008]. While this is significant progress, most of the solutions do not easily scale up, since they usually require some sort of human assistance, infeasible at scale. Unified Famous Object (UFO) [Gubanov et al., 2009], [Bellahsene et al., 2011], [Gubanov et al., 2011a] was one of the first attempts to crack data fusion at scale by introducing a new abstraction called UFO that could incrementally learn different representations of data objects in different sources, and, over time, train itself to recognize and map them with high accuracy without supervision. Having trained many such UFOs, the system would get more and more powerful, as it learned to recognize and access data objects across many sources without supervision. While UFO was definitely progress towards scaling up data fusion, it was not a “silver bullet”, because it was built mostly for relational data. Hence, there is a need for a new large-scale data integration system that could handle many types of data at scale. This paper sketches its architecture, and envisions potential research challenges. First, a few key principles that such a system should follow are described, then follows the discussion on architecture and research avenues. Data Acquisition: Ingest any incoming data for further classification and processing No Schema: Incoming data does not need to fit a pre-defined schema. Instead the schema is extracted from data if needed Scale: Designed to accommodate and process large-scale heterogeneous data sets. For example, a fire hose of highvelocity sensor data or unstructured discharge reports from a hospital Seamless Data Fusion: Unsupervised, source-oblivious fusion at scale to satisfy the user query “Esperanto“ Query Language: A hybrid query language enabling access to different data models used by heterogeneous data. It might be a result of partial composition of several query languages or a new language designed based on a common data model. The first principle sets POLYFUSE apart from a traditional group of DBMSes that were built for relational data models and were around for decades [Stonebraker et al., 1996]. The second principle removes a traditional requirement of data to “obey” a pre-defined schema in order to get into a database. The third principle means the architecture should be designed to support very large heterogeneous datasets from the very beginning. The fourth and fifth principles indicate that due to scale, any fusion algorithms should be unsupervised and any query language should be compatible with different data models in POLYFUSE.
Archive | 2013
Michael N. Gubanov; Linda G. Shapiro; Anna Pyayt
While the problem to find needed information on the Web is being solved by the major search engines, access to the information in Big text, large-scale text datasets, and documents (Biomedical literature, e-books, conference proceedings, etc.) is still very rudimentary (Lin and Cohen (2010) A very fast method for clustering big text datasets. In: ECAI, Lisbon). Thus, keyword-search is often the only way to find the needle in the haystack. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces compared to keyword-search. Here we describe a new information retrieval engine that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs (Gubanov et al. (2009) Ibm ufo repository. In: VLDB, Lyon; Gubanov et al. (2011) Learning unified famous objects (ufo) to bootstrap information integration. In: IEEE IRI, Las Vegas) yields more relevant search results and improves user experience.