Michael N. Gubanov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael N. Gubanov is active.

Explore More

Publication

Featured researches published by Michael N. Gubanov.

international conference on data engineering | 2008

Model Management Engine for Data Integration with Reverse-Engineering Support

Michael N. Gubanov; Philip A. Bernstein; Alexander Moshchuk

Model management is a high-level programming language designed to efficiently manipulate schemas and mappings. It is comprised of robust operators that combined in short programs can solve complex metadata-oriented problems in a compact way. For instance, countless enterprise data integration scenarios can be easily expressed in this high-level language thus saving hundreds of development man-hours. Here we present the first model management engine that has reverse-engineering support for data integration, which is one of the most pressing metadata-oriented problems. It merges two schemas based on the mappings between them and allows user to correct the result keeping all the mappings in sync automatically. For user it is much more convenient than determining which mappings to correct in order to get desired result. In addition, the engine supports restructuring merging which is important when the sources are structured differently and cannot be mapped directly. While making schema merging fully automatic is not yet possible, our work simplifies and automates this process to make it practical in complex data integration scenarios.

information reuse and integration | 2011

Learning Unified Famous Objects (UFO) to bootstrap information integration

Michael N. Gubanov; Linda G. Shapiro; Anna Pyayt

For at least a decade, WWW, large enterprises, and desktop users suffer from an inability to efficiently access and manage their data. To help automate different aspects of this challenging problem many solutions have been proposed in both academic and industrial research. One of them - UFO Repository, introduced in [5] is currently gaining momentum by advocating an object-oriented approach to help manage information overflow. An overview and evaluation of the UFO approach was published and reviewed in [5, 1]. This paper is more focused on the algorithms for UFO creation and knowledge accumulation. We describe, evaluate these algorithms, and demonstrate that UFO learning performance is surprisingly fast and accurate across several domains even for a small amount of initial training data.

international conference on data engineering | 2017

Scalable Linear Algebra on a Relational Database System

Shangyu Luo; Zekai J. Gao; Michael N. Gubanov; Luis Leopoldo Perez; Christopher Jermaine

As data analytics has become an important application for modern data management systems, a new category of data management system has appeared recently: the scalable linear algebra system. In this paper, we argue that a parallel or distributed database system is actually an excellent platform upon which to build such functionality. Most relational systems already have support for cost-based optimization—which is vital to scaling linear algebra computations—and it is well-known how to make relational systems scale. We show that by making just a few changes to a parallel/ distributed relational database system, such a system can be a competitive platform for scalable linear algebra. Taken together, our results should at least raise the possibility that brand new systems designed from the ground up to support scalable linear algebra are not absolutely necessary, and that such systems could instead be built on top of existing relational technology. Our results also suggest that if scalable linear algebra is to be added to a modern dataflow platform such as Spark, they should be added on top of the systems more structured (relational) data abstractions, rather than being constructed directly on top of the systems raw dataflow operators.

bioinformatics and biomedicine | 2011

Using unified famous objects (UFO) to automate Alzheimer's disease diagnostics

Michael N. Gubanov; Linda G. Shapiro

In this paper we discuss automatic pre-diagnostics of Alzheimers Disease using a new object-oriented data integration technology — UFO (Unified Famous Objects). UFO was originally introduced in [3] to simplify access to heterogeneous data in federated data sources.

business intelligence for the real-time enterprises | 2008

Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration

Bogdan Alexe; Michael N. Gubanov; Mauricio A. Hernández; C. T. Howard Ho; Jen Wei Huang; Yannis Katsis; Lucian Popa; Barna Saha; Ioana Stanoi

The Clio project at IBM Almaden investigates foundational aspects of data transformation, with particular emphasis on the design and execution of schema mappings. We now use Clio as part of a broader data-flow framework in which mappings are just one component. These data-flows express complex transformations between several source and target schemas and require multiple mappings to be specified. This paper describes research issues we have encountered as we try to create and run these mapping-based data-flows. In particular, we describe how we use Unified Famous Objects (UFOs), a schema abstraction similar to business objects, as our data model, how we reason about flows of mappings over UFOs, and how we create and deploy transformations into different run-time engines.

information reuse and integration | 2011

READFAST: Browsing large documents through unified famous objects (UFO)

Michael N. Gubanov; Anna Pyayt; Linda G. Shapiro

While the problem to find needed information on the Web is being solved by the major search engines, access to the information in large text documents (e-books, conference proceedings, product manuals, etc) is still very rudimentary. Thus, keyword-search is often the only way to find the needle in the haystack. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces compared to keyword-search. Here we describe a new hybrid document browser that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs [23] yields more relevant search results and improves user experience.

information reuse and integration | 2013

A real-time classification algorithm for emotion detection using portable EEG

Surya Cheemalapati; Michael N. Gubanov; Michael Del Vale; Anna Pyayt

Military personnel, airplane pilots, and bus drivers often operate in stressful conditions when something unexpected can happen and cause dangerous consequences if they do not respond properly. Additionally, stress adversely affects human decision making abilities, therefore prompt, preferably real-time detection of fear is very important. Based on previous studies for non-portable multi-electrode electroencephalography (EEG) systems the ratio of the power of the slow waves to that of the fast waves increases when a person is relaxed and decreases when s/he is scared. In this study we test small portable EEG and develop algorithms for real time detection of the stressful condition - fear. During the experiment we compare EEG signals of subjects in relaxed state with those in stressed state while they are watching a scene from a scary movie. The ratio of the slow/fast wave powers was measured and the observed pattern was similar to one obtained using a multi-electrode system. We integrate stream-processing algorithms in the system to ensure real-time detection of any changes in mental condition and timely generate the alarm event.

information reuse and integration | 2012

MEDREADFAST: A structural information retrieval engine for big clinical text

Michael N. Gubanov; Anna Pyayt

Large scale text mining research, informally called Big text is a crucial part of Big data agenda that recently started gaining momentum [24]. It targets new technologies to manage large amounts of unstructured textual data in order to quickly find an retrieve needed information. In medical domain fast access to information is especially important. Keyword-search, a de-facto standard to search over Electronic Health Records (EHR), being simple and therefore popular technique, however, is not ideal and often returns either too many irrelevant or too few relevant search results. Clinicians, usually very short on time, just cannot afford trial and error of keyword-search and therefore do not use all information available in patient records. Next generation patient care requires more efficient access to valuable information hidden in patient histories represented by millions of patient records. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces to unstructured data compared to keyword-search. Here we describe a new hybrid browser specifically for EHR that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs [25] yields more relevant search results and improves user experience.

international conference on data engineering | 2017

PolyFuse: A Large-Scale Hybrid Data Fusion System

Michael N. Gubanov

The big data era brought us petabytes of data together with the challenges of storing and efficiently accessing largescale datasets. However, it unexpectedly surprised everyone with an enormous variety of data sources and types, and corresponding different data models. Dealing with a variety of those data models turned out to be a “hard nut to crack” for almost all existing data management engines. Data integration, a mature field addressing problems of accessing and fusing data residing in more than one datasource, over the years came up with feasible semi-automatic solutions, most of which efficiently handle a handful of data sources represented in one or two different data models (e.g. relational and semistructured) [Haas et al., 2005], [Gubanov et al., 2008]. While this is significant progress, most of the solutions do not easily scale up, since they usually require some sort of human assistance, infeasible at scale. Unified Famous Object (UFO) [Gubanov et al., 2009], [Bellahsene et al., 2011], [Gubanov et al., 2011a] was one of the first attempts to crack data fusion at scale by introducing a new abstraction called UFO that could incrementally learn different representations of data objects in different sources, and, over time, train itself to recognize and map them with high accuracy without supervision. Having trained many such UFOs, the system would get more and more powerful, as it learned to recognize and access data objects across many sources without supervision. While UFO was definitely progress towards scaling up data fusion, it was not a “silver bullet”, because it was built mostly for relational data. Hence, there is a need for a new large-scale data integration system that could handle many types of data at scale. This paper sketches its architecture, and envisions potential research challenges. First, a few key principles that such a system should follow are described, then follows the discussion on architecture and research avenues. Data Acquisition: Ingest any incoming data for further classification and processing No Schema: Incoming data does not need to fit a pre-defined schema. Instead the schema is extracted from data if needed Scale: Designed to accommodate and process large-scale heterogeneous data sets. For example, a fire hose of highvelocity sensor data or unstructured discharge reports from a hospital Seamless Data Fusion: Unsupervised, source-oblivious fusion at scale to satisfy the user query “Esperanto“ Query Language: A hybrid query language enabling access to different data models used by heterogeneous data. It might be a result of partial composition of several query languages or a new language designed based on a common data model. The first principle sets POLYFUSE apart from a traditional group of DBMSes that were built for relational data models and were around for decades [Stonebraker et al., 1996]. The second principle removes a traditional requirement of data to “obey” a pre-defined schema in order to get into a database. The third principle means the architecture should be designed to support very large heterogeneous datasets from the very beginning. The fourth and fifth principles indicate that due to scale, any fusion algorithms should be unsupervised and any query language should be compatible with different data models in POLYFUSE.

Archive | 2013

ReadFast: Structural Information Retrieval from Biomedical Big Text by Natural Language Processing

Michael N. Gubanov; Linda G. Shapiro; Anna Pyayt

While the problem to find needed information on the Web is being solved by the major search engines, access to the information in Big text, large-scale text datasets, and documents (Biomedical literature, e-books, conference proceedings, etc.) is still very rudimentary (Lin and Cohen (2010) A very fast method for clustering big text datasets. In: ECAI, Lisbon). Thus, keyword-search is often the only way to find the needle in the haystack. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces compared to keyword-search. Here we describe a new information retrieval engine that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs (Gubanov et al. (2009) Ibm ufo repository. In: VLDB, Lyon; Gubanov et al. (2011) Learning unified famous objects (ufo) to bootstrap information integration. In: IEEE IRI, Las Vegas) yields more relevant search results and improves user experience.

Explore More