Rafal A. Angryk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rafal A. Angryk is active.

Explore More

Publication

Featured researches published by Rafal A. Angryk.

international conference on data mining | 2007

GDClust: A Graph-Based Document Clustering Technique

M.S. Hossain; Rafal A. Angryk

This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (graph-based document clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.

international conference on image processing | 2013

A large-scale solar image dataset with labeled event regions

Michael A. Schuh; Rafal A. Angryk; Karthik Ganesan Pillai; Juan M. Banda; Petrus C. H. Martens

This paper introduces a new public benchmark dataset of solar image data from the Solar Dynamics Observatory (SDO) mission. This is the first release, which contains over 15,000 images and nearly 24,000 solar events, spanning the first six months of 2012. It combines region-based event labels from six automated detection modules, ten pre-computed image parameters for each cell over a grid-based segmentation of the full resolution images, and a lower resolution version of the images for further analysis and visualization. Together, these components serve as a standardized, ready-to-use, solar image dataset for general image processing research, without requiring the necessary background knowledge to properly prepare it. We present here the fundamental dataset creation details and outline future improvements and opportunities as data collection continues for the coming years.

International Journal of Approximate Reasoning | 2010

Heuristic algorithm for interpretation of multi-valued attributes in similarity-based fuzzy relational databases

Rafal A. Angryk; Jacek M. Czerniak

In this work, we are presenting implementation details and extended scalability tests of the heuristic algorithm, which we had used in the past [1,2] to discover knowledge from multi-valued data entries stored in similarity-based fuzzy relational databases. The multi-valued symbolic descriptors, characterizing individual attributes of database records, are commonly used in similarity-based fuzzy databases to reflect uncertainty about the recorded observation. In this paper, we present an algorithm, which we developed to precisely interpret such non-atomic values and to transfer the fuzzy database tuples to the forms acceptable for many regular (i.e. atomic values based) data mining algorithms.

systems, man and cybernetics | 2007

Measuring semantic similarity using wordnet-based context vectors

Shen Wan; Rafal A. Angryk

Semantic relatedness between words or concepts is a fundamental problem in many applications of computational linguistics and artificial intelligence. In this paper, a new measure based on the semantic ontology database WordNet is proposed which combines gloss information of concepts with semantic relationships, and organizes concepts as high- dimensional vectors. Other relatedness measures are compared and an experimental evaluation against several benchmark sets of human similarity ratings is presented. The context vector measure is shown to have one of the best performances.

ieee international conference on fuzzy systems | 2009

On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images

Juan M. Banda; Rafal A. Angryk

This paper presents experimental results on the utilization of fuzzy clustering as a discretization technique for purpose of solar images recognition. By extracting texture features from our solar images, and consequently applying fuzzy clustering techniques on these features, we were able to determine what clustering algorithm and what algorithms initialization parameters produced the best data discretization. Based on these results we discretized some of our texture features and ran them on two different classifiers comparing how well the classifiers performed on our original data versus the discretized data. Our experimental results demonstrate that discretization of our data via fuzzy clustering carries significant potential since on our classifiers produced similar results on the original and the discretized data, and the reduction of storage space achieved through cluster-based discretization has been very significant.

advances in geographic information systems | 2013

A filter-and-refine approach to mine spatiotemporal co-occurrences

Karthik Ganesan Pillai; Rafal A. Angryk; Berkay Aydin

Spatiotemporal co-occurrence patterns (STCOPs) represent the subsets of event types that occur together in both space and time. However, the discovery of STCOPs in data sets with extended spatial representations that evolve over time is computationally expensive because of the necessity to calculate interest measures to assess the co-occurrence strength, and the number of candidates for STCOPs growing exponentially with the number of spatiotemporal event types. In this paper, we introduce a novel and effective filter-and-refine algorithm to efficiently find prevalent STCOPs in massive spatiotemporal data repositories with polygon shapes that move and evolve over time. We provide theoretical analysis of our approach, and follow this investigation with a practical evaluation of our algorithm effectiveness on three real-life data sets and one artificial data set.

international conference on data mining | 2012

Spatio-temporal Co-occurrence Pattern Mining in Data Sets with Evolving Regions

Karthik Ganesan Pillai; Rafal A. Angryk; Juan M. Banda; Michael A. Schuh; Tim Wylie

Spatio-temporal co-occurring patterns represent subsets of event types that occur together in both space and time. In comparison to previous work in this field, we present a general framework to identify spatio-temporal co occurring patterns for continuously evolving spatio-temporal events that have polygon-like representations. We also propose a set of measures to identify spatio-temporal co-occurring patterns and propose an Apriori-based spatio-temporal co-occurrence mining algorithm to find prevalent spatio-temporal co-occurring patterns for extended spatial representations that evolve over time. We evaluate our framework on real-life data to demonstrate the effectiveness of our measures and the algorithm. We present results highlighting the importance of our measures in identifying spatio-temporal co-occurrence patterns.

ieee international conference on fuzzy systems | 2005

Mining Multi-Level Associations with Fuzzy Hierarchies

Rafal A. Angryk; Frederick E. Petry

In this paper we investigate application of fuzzy concept hierarchies to mining multi-level knowledge from large datasets via a well-known attribute-oriented induction approach (Han and Kamber, 2000). We analyze in detail the original process of fuzzy hierarchical induction and extend it with two new characteristics which improve applicability of the original approach to scientific data mining. These are a consistency of our fuzzy induction model, and an approximate drilling-down technique allowing a user to retrieve estimated explanations of the generated abstract concept. An application to discovery of multi-level association rules from environmental data stored in a toxic release inventory is presented

digital image computing: techniques and applications | 2010

Selection of Image Parameters as the First Step towards Creating a CBIR System for the Solar Dynamics Observatory

Juan M. Banda; Rafal A. Angryk

This work describes the attribute evaluation sections of the ambitious goal of creating a large-scale content-based image retrieval (CBIR) system for solar phenomena in NASA images from the Solar Dynamics Observatory mission. This mission, with its Atmospheric Imaging Assembly (AIA), is generating eight 4096 pixels x 4096 pixels images every 10 seconds, leading to a data transmission rate of approximately 700 Gigabytes per day from only the AIA component (the entire mission is expected to be sending about 1.5 Terabytes of data per day, for a minimum of 5 years). We investigate unsupervised and supervised methods of selecting image parameters and their importance from the perspective of distinguishing between different types of solar phenomena by using correlation analysis, and three supervised attribute evaluation methods. By selecting the most relevant image parameters (out of the twelve tested) we expect to be able to save 540 Megabytes per day of storage costs for each parameter that we remove. In addition, we also applied several image filtering algorithms on these images in order to investigate the enhancement of our classification results. We confirm our experimental results by running multiple classifiers for comparative analysis on the selected image parameters and filters.

international conference on big data | 2014

Spatiotemporal indexing techniques for efficiently mining spatiotemporal co-occurrence patterns

Berkay Aydin; Dustin Kempton; Vijay Akkineni; Shaktidhar Reddy Gopavaram; Karthik Ganesan Pillai; Rafal A. Angryk

In this paper, we investigate using specifically-designated spatiotemporal indexing techniques for mining cooccurrence patterns from spatiotemporal datasets with evolving polygon-based representations. Previously, suggested techniques for spatiotemporal pattern mining algorithms did not take spatiotemporal indexing techniques into account. We present a new framework for mining spatiotemporal co-occurrence patterns that can use various indexing techniques for efficiently accessing data. Two well-studied spatiotemporal indexing structures, Scalable and Efficient Trajectory Index (SETI) and Chebyshev Polynomial Indexing are currently implemented and available in our framework.

Explore More