Justin McHugh
General Electric
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Justin McHugh.
international conference on big data | 2015
Jenny Weisenberg Williams; Paul Edward Cuddihy; Justin McHugh; Kareem Sherif Aggour; Arvind Menon; Steven M. Gustafson; Timothy Healy
With the advent of Big Data technologies, organizations can efficiently store and analyze more data than ever before. However, extracting maximal value from this data can be challenging for many reasons. For example, datasets are often not stored using human-understandable terms, making it difficult for a large set of users to benefit from them. Further, given that different types of data may be best stored using different technologies, datasets that are closely related may be stored separately with no explicit linkage. Finally, even within individual data stores, there are often inconsistencies in data representations, whether introduced over time or due to different data producers. These challenges are further compounded by frequent additions to the data, including new raw data as well as results produced by large-scale analytics. Thus, even within a single Big Data environment, it is often the case that multiple rich datasets exist without the means to access them in a unified and cohesive way, often leading to lost value. This paper describes the development of a Big Data management infrastructure with semantic technologies at its core to provide a unified data access layer and a consistent approach to analytic execution. Semantic technologies were used to create domain models describing mutually relevant datasets and the relationships between them, with a graphical user interface to transparently query across datasets using domain-model terms. This prototype system was built for GE Power & Waters Power Generation Products Engineering Division, which has produced over 50TB of gas turbine and component prototype test data to date. The system is expected to result in significant savings in productivity and expenditure.
international conference on big data | 2014
Jenny Weisenberg Williams; Kareem Sherif Aggour; John Alan Interrante; Justin McHugh; Eric Thomas Pool
With an exponential increase in time series sensor data generated by an ever-growing number of sensors on industrial equipment, new systems are required to efficiently store and analyze this “Industrial Big Data.” To actively monitor industrial equipment there is a need to process large streams of high velocity time series sensor data as it arrives, and then store that data for subsequent analysis. Historically, separate systems would meet these needs, with neither system having the ability to perform fast analytics incorporating both just-arrived and historical data. In-memory data grids are a promising technology that can support both near real-time analysis and mid-term storage of big datasets, bridging the gap between high velocity and high volume big time series sensor data. This paper describes the development of a prototype infrastructure with an in-memory data grid at its core to analyze high velocity (>100,000 points per second), high volume (TBs) time series data produced by a fleet of gas turbines monitored at GE Power & Waters Remote Monitoring & Diagnostics Center.
very large data bases | 2017
Kareem Sherif Aggour; Jenny Weisenberg Williams; Justin McHugh; Vijay S. Kumar
Most organizations are becoming increasingly data-driven, often processing data from many different sources to enable critical business operations. Beyond the well-addressed challenge of storing and processing large volumes of data, financial institutions in particular are increasingly subject to federal regulations requiring high levels of accountability for the accuracy and lineage of this data. For companies like GE Capital, which maintain data across a globally interconnected network of thousands of systems, it is becoming increasingly challenging to capture an accurate understanding of the data flowing between those systems. To address this problem, we designed and developed a concept lineage tool allowing organizational data flows to be modeled, visualized and interactively explored. This tool has novel features that allow a data flow network to be contextualized in terms of business-specific metadata such as the concept, business, and product for which it applies. Key analysis features have been implemented, including the ability to trace the origination of particular datasets, and to discover all systems where data is found that meets some user-defined criteria. This tool has been readily adopted by users at GE Capital and in a short time has already become a business-critical application, with over 2,200 data systems and over 1,000 data flows captured.
Archive | 2013
Brian Scott Courtney; John Alan Interrante; Kareem Sherif Aggour; Jenny Weisenberg Williams; Ward Linnscott Bowman; Jerry Lin; Sunil Mathur; Justin McHugh
international conference on big data | 2017
Justin McHugh; Paul Edward Cuddihy; Jenny Weisenberg Williams; Kareem Sherif Aggour; Vijay S. Kumar; Varish Mulwad
arXiv: Artificial Intelligence | 2017
Paul Edward Cuddihy; Justin McHugh; Jenny Weisenberg Williams; Varish Mulwad; Kareem Sherif Aggour
Archive | 2016
Sunil Mathur; Justin McHugh; Ryan Cahalane; Ward Linnscott Bowman; Kareem Sherif Aggour; John C. Leppiaho
Archive | 2016
Brian Scott Courtney; Kareem Sherif Aggour; Ward Linnscott Bowman; John Alan Interrante; Sunil Mathur; Justin McHugh; Jenny Weisenberg Williams
Archive | 2016
Sunil Mathur; Ward Linnscott Bowman; Peter Sage; Justin McHugh; Richard A. Carpenter
Archive | 2016
Paul Edward Cuddihy; Ravi Kiran Reddy Palla; Justin McHugh