Is this you? Create Your Porfile

Gregor Endler

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregor Endler is active.

Explore More

Publication

Featured researches published by Gregor Endler.

distributed event-based systems | 2015

An algebra for pattern matching, time-aware aggregates and partitions on relational data streams

Sebastian Herbst; Niko Pollner; Johannes Tenschert; Frank Lauterwald; Gregor Endler; Klaus Meyer-Wegener

Many interesting applications of continuous-query processing are concerned with pattern matching or complex temporal aggregation of events. Real-world queries that rely on these operations are difficult to implement in current stream-processing systems. The reason seems to be a gap between two types of existing query languages: Some languages (e. g. CQL) offer a small set of simple operators that can be combined in order to create complex queries. While these languages provide sound and comprehensible semantics, they lack the expressiveness required for many real-world applications. Other approaches (e. g. Aurora) provide powerful operators but lack semantic strictness, which is required for reasoning about query results. Such reasoning is a prerequisite for safe query optimization. We try to bridge this gap by integrating operators for pattern matching and time-aware aggregates into a general-purpose stream model featuring stream partitioning. These operators can answer several questions that we have found to be relevant in a real-world object-tracking scenario. Moreover, they are formally defined, allowing expressive and efficient queries to be written in CQL-like languages, while remaining understandable and easy to use.

advances in databases and information systems | 2017

Query-Driven Knowledge-Sharing for Data Integration and Collaborative Data Science

Andreas M. Wahl; Gregor Endler; Peter K. Schwab; Sebastian Herbst; Richard Lenz

Writing effective analytical queries requires data scientists to have in-depth knowledge of the existence, semantics, and usage context of data sources. Once gathered, such knowledge is informally shared within a specific team of data scientists, but usually is neither formalized nor shared with other teams. Potential synergies remain unused. We introduce our novel approach of Query-driven Knowledge-Sharing Systems (QKSS). A QKSS extends a data management system with knowledge-sharing capabilities to facilitate user collaboration without altering data analysis workflows. Collective knowledge from the query log is extracted to support data source discovery and data integration. Knowledge is formalized to enable its sharing across data scientist teams.

international conference on management of data | 2012

Data quality and integration in collaborative environments

Gregor Endler

The trend to merge medical practices into cooperatively operating networks and organizational units like Medical Supply Centers generates new challenges for an adequate IT support. In particular, new use cases for common economic planning, controlling and treatment coordination arise. This requires consolidation of data originating from heterogeneous and autonomous software systems. Heterogeneity and autonomy are core reasons for low data quality. The intuitive approach of initially integrating heterogeneous systems into a federated system creates a very high upfront effort before the system can become operable and does not adequately consider the fact that data quality requirements might change over time. To remedy this, we propose an approach for continuous data quality improvement which enables a demand driven step by step system integration. By adapting the generic Total Data Quality Management process to healthcare specific use cases, we are developing an extended model for continuous data quality management in cooperative healthcare settings. The IT tools which are needed to provide the information that drives this process are currently in development within a government supported project involving both industry and academia.

international conference on management of data | 2018

A graph-based framework for analyzing SQL query logs

Andreas M. Wahl; Gregor Endler; Peter K. Schwab; Julian Rith; Sebastian Herbst; Richard Lenz

Analytical SQL queries are a valuable source of information. Query log analysis can provide insight into the usage of datasets and uncover knowledge that cannot be inferred from source schemas or content alone. To unlock this potential, flexible mechanisms for meta-querying are required. Syntactic and semantic aspects of queries must be considered along with contextual information. We present an extensible framework for analyzing SQL query logs. Query logs are mapped to a multi-relational graph model and queried using domain-specific traversal expressions. To enable concise and expressive meta-querying, semantic analyses are conducted on normalized relational algebra trees with accompanying schema lineage graphs. Syntactic analyses can be conducted on corresponding query texts and abstract syntax trees. Additional metadata allows to inspect the temporal and social context of each query. In this demonstration, we show how query log analysis with our framework can support data source discovery and facilitate collaborative data science. The audience can explore an exemplary query log to locate queries relevant to a data analysis scenario, conduct graph analyses on the log and assemble a customized logmonitoring dashboard.

advances in databases and information systems | 2015

ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible?

Gregor Endler; Philipp Baumgärtel; Andreas M. Wahl; Richard Lenz

Measuring the completeness of a data population often requires either expert knowledge or the presence of reference data. If neither is available, measuring population completeness becomes nontrivial. We present the ForCE approach (Forecasting for Completeness Estimation), a method to estimate the completeness of timestamped data using time series forecasting. We evaluate the method’s feasibility using a medical domain real-world dataset, which we provide for download. The method is compared to three baselines. ForCE manages to surpass all three.

advances in databases and information systems | 2013

A Benchmark for Multidimensional Statistical Data

Philipp Baumgärtel; Gregor Endler; Richard Lenz

ProHTA Prospective Health Technology Assessment is a simulation project that aims at estimating the outcome of new medical innovations at an early stage. To this end, hybrid and modular simulations are employed. For this large scale simulation project, efficient management of multidimensional statistical data is important. Therefore, we propose a benchmark to evaluate query processing of this kind of data in relational and non-relational databases. We compare our benchmark with existing approaches and point out differences. This paper presents a mapping to a flexible relational model, JSON documents and RDF. The queries defined for our benchmark are mapped to SQL, SPARQL, the MongoDB query language and MapReduce. Using our benchmark, we evaluate these different systems and discuss differences between them.

statistical and scientific database management | 2018

Crossing an OCEAN of queries: analyzing SQL query logs with OCEANLog

Andreas M. Wahl; Gregor Endler; Peter K. Schwab; Sebastian Herbst; Julian Rith; Richard Lenz

SQL queries encapsulate the knowledge of their authors about the usage of the queried data sources. This knowledge also contains aspects that cannot be inferred by analyzing the contents of the queried data sources alone. Due to the complexity of analytical SQL queries, specialized mechanisms are necessary to enable the user-friendly formulation of meta-queries over an existing query log. Currently existing approaches do not sufficiently consider syntactic and semantic aspects of queries along with contextual information. During our demonstration, conference participants learn how to use the latest release of OCEANLog, a framework for analyzing SQL query logs. Our demonstration encompasses several scenarios. Participants can explore an existing query log using domain-specific graph traversal expressions, set up continuous subscriptions for changes in the graph, create time-based visualizations of query results, configure an OCEANLog instance and learn how to choose a decide which specific graph database to use. We also provide them with access to the native meta-query mechanisms of a DBMS to further emphasize the benefits of our graph-based approach.

conference on computer supported cooperative work | 2017

We Can Query More than We Can Tell: Facilitating Collaboration Through Query-Driven Knowledge-Sharing

Andreas M. Wahl; Gregor Endler; Peter K. Schwab; Sebastian Herbst; Richard Lenz

We introduce Query-driven Knowledge-Sharing Systems (QKSS), which extend data management systems with knowledge-sharing capabilities to facilitate collaboration among different teams of data scientists. Relevant tacit knowledge about data sources is extracted from SQL query logs and externalized to support data source discovery and data integration. By studying this collaborative knowledge, data scientists are enabled to formulate effective analytical queries over unfamiliar data sources.

biomedical engineering systems and technologies | 2014

Toward Pay-As-You-Go Data Integration for Healthcare Simulations

Philipp Baumgärtel; Gregor Endler; Richard Lenz

ProHTA (Prospective Health Technology Assessment) aims at understanding the impact of innovative medical processes and technologies at an early stage. To that end, large scale healthcare simulations are employed to estimate the effects of potential innovations. Simulation techniques are also utilized to detect areas with a high potential for improving the supply chain of healthcare. The data needed for both validating and adjusting these simulations typically comes from various heterogeneous sources and is often preaggregated and insufficiently documented. Thus, new data management techniques are required to cope with these conditions. Because of the high initial integration effort, we propose a pay-as-you-go approach using RDF. Thereby, data storage is separated from semantic annotation. Our proposed system offers automatic initial integration of various data sources. Additionally, it provides methods for searching semantically annotated data and for loading it into the simulation. The user can add annotations to the data in order to enable semantic integration on demand. In this paper, we demonstrate the feasibility of this approach with a prototype implementation. We discuss benefits and remaining challenges.

Archive | 2013