Michael A. Schuh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael A. Schuh is active.

Explore More

Publication

Featured researches published by Michael A. Schuh.

international conference on image processing | 2013

A large-scale solar image dataset with labeled event regions

Michael A. Schuh; Rafal A. Angryk; Karthik Ganesan Pillai; Juan M. Banda; Petrus C. H. Martens

This paper introduces a new public benchmark dataset of solar image data from the Solar Dynamics Observatory (SDO) mission. This is the first release, which contains over 15,000 images and nearly 24,000 solar events, spanning the first six months of 2012. It combines region-based event labels from six automated detection modules, ten pre-computed image parameters for each cell over a grid-based segmentation of the full resolution images, and a lower resolution version of the images for further analysis and visualization. Together, these components serve as a standardized, ready-to-use, solar image dataset for general image processing research, without requiring the necessary background knowledge to properly prepare it. We present here the fundamental dataset creation details and outline future improvements and opportunities as data collection continues for the coming years.

international conference on data mining | 2012

Spatio-temporal Co-occurrence Pattern Mining in Data Sets with Evolving Regions

Karthik Ganesan Pillai; Rafal A. Angryk; Juan M. Banda; Michael A. Schuh; Tim Wylie

Spatio-temporal co-occurring patterns represent subsets of event types that occur together in both space and time. In comparison to previous work in this field, we present a general framework to identify spatio-temporal co occurring patterns for continuously evolving spatio-temporal events that have polygon-like representations. We also propose a set of measures to identify spatio-temporal co-occurring patterns and propose an Apriori-based spatio-temporal co-occurrence mining algorithm to find prevalent spatio-temporal co-occurring patterns for extended spatial representations that evolve over time. We evaluate our framework on real-life data to demonstrate the effectiveness of our measures and the algorithm. We present results highlighting the importance of our measures in identifying spatio-temporal co-occurrence patterns.

ieee aerospace conference | 2011

Graph-based ontology-guided data mining for D-matrix model maturation

Shane Strasser; John W. Sheppard; Michael A. Schuh; Rafal A. Angryk; Clemente Izurieta

In model-based diagnostic algorithms, it is assumed that the model is correct. If the model is incorrect, the diagnostic algorithm may diagnose the wrong fault, which can be costly and time consuming. Using past maintenance events, one should be able to make corrections to the model in order for diagnostic algorithm to correctly diagnosis faults. In this paper, a maturation approach is proposed which uses the graph-theoretic representations of Timed Failure Propagation Graph (TFPG) models and diagnostic sessions based on recently standardized diagnostic ontologies to determine statistical discrepancies between that which is expected by the models and that which has been encountered in practice. These discrepancies are then analyzed to generate recommendations for maturing the diagnostic models. Maturation recommendations include identifying new dependencies and erroneous or tenuous dependencies. 1 2

advances in databases and information systems | 2014

Spatiotemporal Co-occurrence Rules

Karthik Ganesan Pillai; Rafal A. Angryk; Juan M. Banda; Tim Wylie; Michael A. Schuh

Spatiotemporal co-occurrence rules (STCORs) discovery is an important problem in many application domains such as weather monitoring and solar physics, which is our application focus. In this paper, we present a general framework to identify STCORs for continuously evolving spatiotemporal events that have extended spatial representations. We also analyse a set of anti-monotone (monotonically non-increasing) and non anti-monotone measures to identify STCORs. We then validate and evaluate our framework on a real-life data set and report results of the comparison of the number candidates needed to discover actual patterns, memory usage, and the number of STCORs discovered using the anti-monotonic and non anti-monotonic measures.

advances in databases and information systems | 2014

Big Data New Frontiers: Mining, Search and Management of Massive Repositories of Solar Image Data and Solar Events

Juan M. Banda; Michael A. Schuh; Rafal A. Angryk; Karthik Ganesan Pillai; Patrick McInerney

This work presents one of the many emerging research domains where big data analysis has become an immediate need to process the massive amounts of data being generated each day: solar physics. While building a content-based image retrieval system for NASA’s Solar Dynamics Observatory mission, we have discovered research problems that can be addressed by the use of big data processing techniques and in some cases require the development of novel techniques. With over one terabyte of solar data being generated each day, and ever more missions on the horizon that expect to generate petabytes of data each year, solar physics presents many exciting opportunities. This paper presents the current status of our work with solar image data and events, our shift towards using big data methodologies, and future directions for big data processing in solar physics.

advances in databases and information systems | 2014

When Too Similar Is Bad: A Practical Example of the Solar Dynamics Observatory Content-Based Image-Retrieval System

Juan M. Banda; Michael A. Schuh; Tim Wylie; Patrick McInerney; Rafal A. Angryk

The measuring of interest and relevance have always been some of the main concerns when analyzing the results of a Content-Based Image-Retrieval (CBIR) system. In this work, we present a unique problem that the Solar Dynamics Observatory (SDO) CBIR system encounters: too many highly similar images. Producing over 70,000 images of the Sun per day, the problem of finding similar images is transformed into the problem of finding similar solar events based on image similarity. However, the most similar images of our dataset are temporal neighbors capturing the same event instance. Therefore a traditional CBIR system will return highly repetitive images rather than similar but distinct events. In this work we outline the problem in detail, present several approaches tested in order to solve this important image data mining and information retrieval issue.

british national conference on databases | 2013

A comprehensive study of idistance partitioning strategies for k NN queries and high-dimensional data indexing

Michael A. Schuh; Tim Wylie; Juan M. Banda; Rafal A. Angryk

Efficient database indexing and information retrieval tasks such as k-nearest neighbor (kNN) search still remain difficult challenges in large-scale and high-dimensional data. In this work, we perform the first comprehensive analysis of different partitioning strategies for the state-of-the-art high-dimensional indexing technique iDistance. This work greatly extends the discussion of why certain strategies work better than others over datasets of various distributions, dimensionality, and size. Through the use of novel partitioning strategies and extensive experimentation on real and synthetic datasets, our results establish an up-to-date iDistance benchmark for efficient kNN querying of large-scale and high-dimensional data and highlight the inherent difficulties associated with such tasks. We show that partitioning strategies can greatly affect the performance of iDistance and outline current best practices for using the indexing algorithm in modern application or comparative evaluation.

autotestcon | 2011

Ontology-guided knowledge discovery of event sequences in maintenance data

Michael A. Schuh; John W. Sheppard; Shane Strasser; Rafal A. Angryk; Clemente Izurieta

We created an application that facilitates improved knowledge discovery from aircraft maintenance data by transforming transactional database records into ontology-based event graphs, and then providing a filterable visualization of event sequences through time. We developed OWL ontologies based on formally defined IEEE standards, and use these ontologies to guide the data mining and data transformation processes. Our application removes much of the users burden for data look-up and greatly increases the potential for knowledge discovery from data (KDD) in this field. We provide an easy-to-use interface that generates relevant sequences of data in a meaningful context in a fraction of the time it would take domain experts to retrieve and display similar information.

international conference on big data | 2014

Massive labeled solar image data benchmarks for automated feature recognition

Michael A. Schuh; Rafal A. Angryk

This paper introduces standard benchmarks for automated feature recognition using solar image data from the Solar Dynamics Observatory (SDO) mission. We combine general purpose image parameters extracted in-line from this massive data stream of images with reported solar event metadata records from automated detection modules to create a variety of event-labeled image datasets. These new large-scale datasets can be used for computer vision and machine learning benchmarks as-is, or as the starting point for further data mining research and investigations, the results of which can also aide understanding and knowledge discovery in the solar science community. Here we present an overview of the dataset creation process, including data collection, analysis, and labeling, which currently spans over two years of data and continues to grow with the ongoing mission. We then highlight two case studies to evaluate several data labeling methodologies and provide real world examples of our dataset benchmarks. Preliminary results show promising capability for the recognition of solar flare events and the classification of active and quiet regions of the Sun.

advances in databases and information systems | 2013

Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing

Michael A. Schuh; Tim Wylie; Rafal A. Angryk

Efficient data indexing and nearest neighbor retrieval are challenging tasks in high-dimensional spaces. This work builds upon our previous analyses of iDistance partitioning strategies to develop the backbone of a new indexing method using a heuristic-guided hybrid index that further segments congested areas of the dataspace to improve overall performance for exact k-nearest neighbor kNN queries. We develop data-driven heuristics to intelligently guide the segmentation of distance-based partitions into spatially disjoint sections that can be quickly and efficiently pruned during retrieval. Extensive tests are performed on k-means derived partitions over datasets of varying dimensionality, size, and cluster compactness. Experiments on both real and synthetic high-dimensional data show that our new index performs significantly better on clustered data than the state-of-the-art iDistance indexing method.

Explore More