Bogdan Alexe
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bogdan Alexe.
very large data bases | 2008
Bogdan Alexe; Wang Chiew Tan; Yannis Velegrakis
A fundamental problem in information integration is to precisely specify the relationships, called mappings, between schemas. Designing mappings is a time-consuming process. To alleviate this problem, many mapping systems have been developed to assist the design of mappings. However, a benchmark for comparing and evaluating these systems has not yet been developed. We present STBenchmark, a solution towards a much needed benchmark for mapping systems. We first describe the challenges that are unique to the development of benchmarks for mapping systems. After this, we describe the three components of STBenchmark: (1) a basic suite of mapping scenarios that we believe represents a minimum set of transformations that should be readily supported by any mapping system, (2) a mapping scenario generator as well as an instance generator that can produce complex mapping scenarios and, respectively, instances of varying sizes of a given schema, (3) a simple usability model that can be used as a first-cut measure on the case of use of a mapping system. We use STBenchmark to evaluate four mapping systems and report our results, as well as describe some interesting observations.
international conference on data engineering | 2008
Bogdan Alexe; Laura Chiticariu; Renée J. Miller; Wang Chiew Tan
A fundamental problem in information integration is that of designing the relationships, called schema mappings, between two schemas. The specification of a semantically correct schema mapping is typically a complex task. Automated tools can suggest potential mappings, but few tools are available for helping a designer understand mappings and design alternative mappings. We describe Muse, a mapping design wizard that uses data examples to assist designers in understanding and refining a schema mapping towards the desired specification. We present novel algorithms behind Muse and show how Muse systematically guides the designer on two important components of a mapping design: the specification of the desired grouping semantics for sets of data and the choice among alternative interpretations for semantically ambiguous mappings. In every component, Muse infers the desired semantics based on the designers actions on a short sequence of small examples. Whenever possible, Muse draws examples from a familiar database, thus facilitating the design process even further. We report our experience with Muse on some publicly available schemas.
very large data bases | 2008
Bogdan Alexe; Wang Chiew Tan; Yannis Velegrakis
Schema mappings are fundamental building blocks in many information integration applications. Designing mappings is a time-consuming process and for that reason many mapping systems have been developed to assist in the task of designing mappings. However, to the best of our knowledge, a benchmark for comparing and evaluating these systems has not yet been developed. We demonstrate STBenchmark, a benchmark that we have developed for evaluating mapping systems. Our demonstration will showcase the different aspects of mapping systems that STBenchmark evaluates, highlight the results of our comparison and evaluation of four mapping systems, as well as make a case for the need for a standard specification input mechanism to mapping systems in order to make progress towards the development of a uniform testbed or repository for schema mappings and data exchange tasks.
international conference on management of data | 2008
Bogdan Alexe; Laura Chiticariu; Renée J. Miller; Daniel Pepper; Wang Chiew Tan
Schema mappings are logical assertions that specify the relationships between a source and a target schema in a declarative way. The specification of such mappings is a fundamental problem in information integration. Mappings can be generated by existing mapping systems (semi-)automatically from a visual specification between two schemas. In general, the well-known 80-20 rule applies for mapping generation tools. They can automate 80% of the work, covering common cases and creating a mapping that is close to correct. However, ensuring complete correctness can still require intricate manual work to perfect portions of the mapping. Previous research on mapping understanding and refinement and anecdotal evidence from mapping designers suggest that the mapping design process can be perfected by using data examples to explain the mapping and alternative mappings. We demonstrate Muse, a data example driven mapping design tool currently implemented on top of the Clio schema mapping system. Muse leverages data examples that are familiar to a designer to illustrate nuances of how a small change to a mapping specification changes its semantics. We demonstrate how Muse can differentiate between alternative mapping specifications and infer the desired mapping semantics based on the designers actions on a short sequence of simple data examples.
very large data bases | 2004
Serge Abiteboul; Bogdan Alexe; Omar Benjelloun; Bogdan Cautis; Irini Fundulaki; Tova Milo; Arnaud Sahuguet
This chapter investigates how an electronic patient record (EPR) document can be managed by a number of active XML (AXML) peers representing hospitals, MDs, insurance companies, and Department of Health. These peers are living on remote servers, laptops, and mobile devices. Each of them provides integrated information/filtering services, or a combination of these, such as a hospital provides information about visits, while an insurance company gives reimbursement reports, and access control on both of them is enforced by both the regulations of the Department of Health, and their own respective privacy policies. The chapter also illustrates how the distributed data can be queried by different users, and how the specified access control rules are enforced along the way. It also investigates how the queries are executed efficiently—only the relevant/permissible parts of the AXML document are exchanged among peers, an AXML peer can fully or partially evaluate a query by delegating some of the computation to filtering peers or information sources.
international conference on big data | 2013
Mauricio A. Hernández; Kirsten Hildrum; Prateek Jain; Rohit Wagle; Bogdan Alexe; Rajasekar Krishnamurthy; Ioana Stanoi; Chitra Venkatramani
Social media is playing a growing role in providing consumer feedback to companies about their products and services. To maximize the benefit of this feedback, companies want to know how different consumer-segments they are interested in, such as parents, frequent travelers, and comic book fans react to their products and campaigns. In this paper, we describe how constructing consumer profiles is valuable to obtain such insights. We present the challenges in analyzing noisy social media data and the techniques we employ for building the profiles. We also present detailed experimental results from the analysis of over seven billion messages to construct profiles of over 100 million consumers. We demonstrate how consumer profiles can help in understanding consumer feedback by different key segments using a TV show analysis scenario.
international conference on management of data | 2012
Bogdan Alexe; Mauricio A. Hernández; Kirsten Hildrum; Rajasekar Krishnamurthy; Georgia Koutrika; Meenakshi Nagarajan; Haggai Roitman; Michal Shmueli-Scheuer; Ioana Stanoi; Chitra Venkatramani; Rohit Wagle
We propose to demonstrate an end-to-end framework for leveraging time-sensitive and critical social media information for businesses. More specifically, we focus on identifying, structuring, integrating, and exposing timely insights that are essential to marketing services and monitoring reputation over social media. Our system includes components for information extraction from text, entity resolution and integration, analytics, and a user interface.
business intelligence for the real-time enterprises | 2008
Bogdan Alexe; Michael N. Gubanov; Mauricio A. Hernández; C. T. Howard Ho; Jen Wei Huang; Yannis Katsis; Lucian Popa; Barna Saha; Ioana Stanoi
The Clio project at IBM Almaden investigates foundational aspects of data transformation, with particular emphasis on the design and execution of schema mappings. We now use Clio as part of a broader data-flow framework in which mappings are just one component. These data-flows express complex transformations between several source and target schemas and require multiple mappings to be specified. This paper describes research issues we have encountered as we try to create and run these mapping-based data-flows. In particular, we describe how we use Unified Famous Objects (UFOs), a schema abstraction similar to business objects, as our data model, how we reason about flows of mappings over UFOs, and how we create and deploy transformations into different run-time engines.
In Search of Elegance in the Theory and Practice of Computation | 2013
Bogdan Alexe; Douglas Burdick; Mauricio A. Hernández; Georgia Koutrika; Rajasekar Krishnamurthy; Lucian Popa; Ioana Stanoi; Ryan Wisnesky
Data integration remains a perenially difficult task. The need to access, integrate and make sense of large amounts of data has, in fact, accentuated in recent years. There are now many publicly available sources of data that can provide valuable information in various domains. Concrete examples of public data sources include: bibliographic repositories (DBLP, Cora, Citeseer), online movie databases (IMDB), knowledge bases (Wikipedia, DBpedia, Freebase), social media data (Facebook and Twitter, blogs). Additionally, a number of more specialized public data repositories are starting to play an increasingly important role. These repositories include, for example, the U.S. federal government data, congress and census data, as well as financial reports archived by the U.S. Securities and Exchange Commission (SEC).
In Search of Elegance in the Theory and Practice of Computation | 2013
Bogdan Alexe; Wang-Chiew Tan
One of the fundamental tasks in information integration is to specify the relationships, called schema mappings, between database schemas. Schema mappings specify how data structured under a source schema is to be transformed into data structured under a target schema. The design of schema mappings is usually a non-trivial and time-intensive process and the task of designing schema mappings is exacerbated by the fact that schemas that occur in real life tend to be large and heterogeneous. Traditional approaches for designing schema mappings are either manual or performed through a user interface from which a schema mapping is interpreted from correspondences between attributes of the source and target schemas. These correspondences are either specified by the user or automatically derived by applying schema matching on the two schemas.