Kareem Sherif Aggour | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kareem Sherif Aggour is active.

Explore More

Publication

Featured researches published by Kareem Sherif Aggour.

international conference on case based reasoning | 2003

SOFT-CBR: a self-optimizing fuzzy tool for case-based reasoning

Kareem Sherif Aggour; Marc Pavese; Piero P. Bonissone; William Cheetham

A generic Case-Based Reasoning tool has been designed, implemented, and successfully used in two distinct applications. SOFT-CBR can be applied to a wide range of decision problems, independent of the underlying input case data and output decision space. The tool supplements the traditional case base paradigm by incorporating Fuzzy Logic concepts in a flexible, extensible component-based architecture. An Evolutionary Algorithm has also been incorporated into SOFT-CBR to facilitate the optimization and maintenance of the system. SOFT-CBR relies on simple XML files for configuration, enabling its widespread use beyond the software development community. SOFT-CBR has been used in an automated insurance underwriting system and a gas turbine diagnosis system.

Computational Statistics & Data Analysis | 2006

Design of local fuzzy models using evolutionary algorithms

Piero P. Bonissone; Anil Varma; Kareem Sherif Aggour; Feng Xue

The application of local fuzzy models to determine the remaining life of a unit in a fleet of vehicles is described. Instead of developing individual models based on the track history of each unit or developing a global model based on the collective track history of the fleet, local fuzzy models are used based on clusters of peers-similar units with comparable utilization and performance characteristics. A local fuzzy performance model is created for each cluster of peers. This is combined with an evolutionary framework to maintain the models. A process has been defined to generate a collection of competing models, evaluate their performance in light of the currently available data, refine the best models using evolutionary search, and select the best one after a finite number of iterations. This process is repeated periodically to automatically update and improve the overall model. To illustrate this methodology an asset selection problem has been identified: given a fleet of industrial vehicles (diesel electric locomotives), select the best subset for mission-critical utilization. To this end, the remaining life of each unit in the fleet is predicted. The fleet is then sorted using this prediction and the highest ranked units are selected. A series of experiments using data from locomotive operations was conducted and the results from an initial validation exercise are presented. The approach of constructing local predictive models using fuzzy similarity with neighboring points along appropriate dimensions is not specific to any asset type and may be applied to any problem where the premise of similarity along chosen attribute dimensions implies similarity in predicted future behavior.

innovative applications of artificial intelligence | 2005

Automating the underwriting of insurance applications

Kareem Sherif Aggour; William Cheetham

An end-to-end system was created at Genworth Financial to automate the underwriting of Long Term Care (LTC) and Life Insurance applications. Relying heavily on Artiticial Intelligence techniques, the system has been in production since December 2002 and today completely automates the underwriting of 19.2% of the LTC applications. A fuzzy logic rules engine encodes the underwriter guidelines and an evolutionary algorithm optimizes the engines performance. Finally, a natural language parser is used to improve the coverage of the underwriting system.

ieee international conference on high performance computing data and analytics | 2012

Applying Cluster Computing to Enable a Large-scale Smart Grid Stability Monitoring Application

John Alan Interrante; Kareem Sherif Aggour

The real-time execution of grid stability monitoring algorithms are critical to enabling a truly smart grid. However, the combination of a high sampling rate for grid monitoring devices, combined with a large number of devices scattered across a grid, result in very high throughput requirements for the execution of these algorithms. Here we define a centralized hardware and software infrastructure to enable the real-time execution of a small signal oscillation detection algorithm using a cluster of commodity nodes. Our research has demonstrated that readings from up to 500 phasor measurement units (PMUs) sampling at 60Hz can be analyzed in real-time by a single 8-core, 2.53GHz machine with 8GB of RAM, and that a cluster of four of these machines can be used to monitor up to 2,000 PMUs in parallel.

international conference on big data | 2015

Semantics for Big Data access & integration: Improving industrial equipment design through increased data usability

Jenny Weisenberg Williams; Paul Edward Cuddihy; Justin McHugh; Kareem Sherif Aggour; Arvind Menon; Steven M. Gustafson; Timothy Healy

With the advent of Big Data technologies, organizations can efficiently store and analyze more data than ever before. However, extracting maximal value from this data can be challenging for many reasons. For example, datasets are often not stored using human-understandable terms, making it difficult for a large set of users to benefit from them. Further, given that different types of data may be best stored using different technologies, datasets that are closely related may be stored separately with no explicit linkage. Finally, even within individual data stores, there are often inconsistencies in data representations, whether introduced over time or due to different data producers. These challenges are further compounded by frequent additions to the data, including new raw data as well as results produced by large-scale analytics. Thus, even within a single Big Data environment, it is often the case that multiple rich datasets exist without the means to access them in a unified and cohesive way, often leading to lost value. This paper describes the development of a Big Data management infrastructure with semantic technologies at its core to provide a unified data access layer and a consistent approach to analytic execution. Semantic technologies were used to create domain models describing mutually relevant datasets and the relationships between them, with a graphical user interface to transparently query across datasets using domain-model terms. This prototype system was built for GE Power & Waters Power Generation Products Engineering Division, which has produced over 50TB of gas turbine and component prototype test data to date. The system is expected to result in significant savings in productivity and expenditure.

international conference on big data | 2014

Bridging high velocity and high volume industrial big data through distributed in-memory storage & analytics

Jenny Weisenberg Williams; Kareem Sherif Aggour; John Alan Interrante; Justin McHugh; Eric Thomas Pool

With an exponential increase in time series sensor data generated by an ever-growing number of sensors on industrial equipment, new systems are required to efficiently store and analyze this “Industrial Big Data.” To actively monitor industrial equipment there is a need to process large streams of high velocity time series sensor data as it arrives, and then store that data for subsequent analysis. Historically, separate systems would meet these needs, with neither system having the ability to perform fast analytics incorporating both just-arrived and historical data. In-memory data grids are a promising technology that can support both near real-time analysis and mid-term storage of big datasets, bridging the gap between high velocity and high volume big time series sensor data. This paper describes the development of a prototype infrastructure with an in-memory data grid at its core to analyze high velocity (>100,000 points per second), high volume (TBs) time series data produced by a fleet of gas turbines monitored at GE Power & Waters Remote Monitoring & Diagnostics Center.

bioinformatics and biomedicine | 2015

A highly parallel next-generation DNA sequencing data analysis pipeline in Hadoop

Kareem Sherif Aggour; Vijay S. Kumar; Dipen Sangurdekar; Lee Aaron Newberg; Chinnappa D. Kodira

The era of precision medicine is best exemplified by the growing reliance on next-generation sequencing (NGS) technologies to provide improved disease diagnosis and targeted therapeutic selection. Well-established NGS data analysis software tools, in their unmodified form, can take days to identify and interpret single nucleotide and structural variations in DNA for a single patient. To improve sample analysis throughput, we developed a highly parallel end-to-end next-generation DNA sequencing data analysis pipeline in Hadoop. In our pipeline, each step is parallelized not only across samples but also within each individual sample, achieving a 30× speedup over a single server workflow execution. Furthermore, we extensively evaluate the viability of having our Hadoop-based pipeline as part of a larger commercial genomic services offering-we demonstrate how our pipeline scales sub-linearly both with the number of samples being analyzed and with the depth of coverage of those samples. In particular, on our commodity cluster, 10× as many samples resulted in only a 2.24× increase in the execution time, and a 4× increase in coverage depth resulted in only a 2.53× growth in execution time. We anticipate that such improvements will allow large cohort populations to be analyzed in parallel, and can fundamentally change the way DNA sequencing analyses are used by both researchers and clinicians.

Proceedings of SPIE | 2016

A machine learning approach to quantifying noise in medical images

Aritra Chowdhury; Christopher Sevinsky; Bülent Yener; Kareem Sherif Aggour; Steven M. Gustafson

As advances in medical imaging technology are resulting in significant growth of biomedical image data, new techniques are needed to automate the process of identifying images of low quality. Automation is needed because it is very time consuming for a domain expert such as a medical practitioner or a biologist to manually separate good images from bad ones. While there are plenty of de-noising algorithms in the literature, their focus is on designing filters which are necessary but not sufficient for determining how useful an image is to a domain expert. Thus a computational tool is needed to assign a score to each image based on its perceived quality. In this paper, we introduce a machine learning-based score and call it the Quality of Image (QoI) score. The QoI score is computed by combining the confidence values of two popular classification techniques—support vector machines (SVMs) and Naïve Bayes classifiers. We test our technique on clinical image data obtained from cancerous tissue samples. We used 747 tissue samples that are stained by four different markers (abbreviated as CK15, pck26, E_cad and Vimentin) leading to a total of 2,988 images. The results show that images can be classified as good (high QoI), bad (low QoI) or ugly (intermediate QoI) based on their QoI scores. Our automated labeling is in agreement with the domain experts with a bi-modal classification accuracy of 94%, on average. Furthermore, ugly images can be recovered and forwarded for further post-processing.

knowledge discovery and data mining | 2013

Financing lead triggers: empowering sales reps through knowledge discovery and fusion

Kareem Sherif Aggour; Bethany Kniffin Hoogs

Sales representatives must have access to meaningful and actionable intelligence about potential customers to be effective in their roles. Historically, GE Capital Americas sales reps identified leads by manually searching through news reports and financial statements either in print or online. Here we describe a system built to automate the collection and aggregation of information on companies, which is then mined to identify actionable sales leads. The Financing Lead Triggers system is comprised of three core components that perform information fusion, knowledge discovery and information visualization. Together these components extract raw data from disparate sources, fuse that data into information, and then automatically mine that information for actionable sales leads driven by a combination of expert-defined and statistically derived triggers. A web-based interface provides sales reps access to the company information and sales leads in a single location. The use of the Lead Triggers system has significantly improved the performance of the sales reps, providing them with actionable intelligence that has improved their productivity by 30-50%. In 2010, Lead Triggers provided leads on opportunities that represented over

international conference on cluster computing | 2007