George F. Zaki
University of Maryland, College Park
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George F. Zaki.
rapid system prototyping | 2011
William Plishker; George F. Zaki; Shuvra S. Bhattacharyya; Charles Clancy; John Kuykendall
With higher bandwidth requirements and more complex protocols, software defined radio (SDR) has ever growing computational demands. SDR applications have different levels of parallelism that can be exploited on multicore platforms, but design and programming difficulties have inhibited the adoption of specialized multicore platforms like graphics processors (GPUs). In this work we propose a new design flow that augments a popular existing SDR development environment (GNU Radio), with a dataflow foundation and a stand-alone GPU accelerated library. The approach gives an SDR developer the ability to prototype a GPU accelerated application and explore its design space fast and effectively. We demonstrate this design flow on a standard SDR benchmark and show that deciding how to utilize a GPU can be non-trivial for even relatively simple applications.
signal processing systems | 2013
George F. Zaki; William Plishker; Shuvra S. Bhattacharyya; Charles Clancy; John Kuykendall
As the variety of off-the-shelf processors expands, traditional implementation methods of systems for digital signal processing and communication are no longer adequate to achieve design objectives in a timely manner. There is a necessity for designers to easily track the changes in computing platforms, and apply them efficiently while reusing legacy code and optimized libraries that target specialized features in single processing units. In this context, we propose an integration workflow to schedule and implement Software Defined Radio (SDR) protocols that are developed using the GNU Radio environment on heterogeneous multiprocessor platforms. We show how to utilize Single Instruction Multiple Data (SIMD) units provided in Graphics Processing Units (GPUs) along with vector accelerators implemented in General Purpose Processors (GPPs). We augment a popular SDR framework (i.e, GNU Radio) with a library that seamlessly allows offloading of algorithm kernels mapped to the GPU without changing the original protocol description. Experimental results show how our approach can be used to efficiently explore design spaces for SDR system implementation, and examine the overhead of the integrated backend (software component) library.
Academic Radiology | 2015
Junichi Tokuda; William Plishker; Meysam Torabi; Olutayo Olubiyi; George F. Zaki; Servet Tatli; Stuart G. Silverman; Raj Shekher; Nobuhiko Hata
RATIONALE AND OBJECTIVES Accuracy and speed are essential for the intraprocedural nonrigid magnetic resonance (MR) to computed tomography (CT) image registration in the assessment of tumor margins during CT-guided liver tumor ablations. Although both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique on the basis of volume subdivision with hardware acceleration using a graphics processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. MATERIALS AND METHODS Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice similarity coefficient [DSC] and 95% Hausdorff distance [HD]) and total processing time including contouring of ROIs and computation were compared using a paired Student t test. RESULTS Accuracies of the GPU-accelerated registrations and B-spline registrations, respectively, were 88.3 ± 3.7% versus 89.3 ± 4.9% (P = .41) for DSC and 13.1 ± 5.2 versus 11.4 ± 6.3 mm (P = .15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 versus 557 ± 116 seconds (P < .000000002), respectively; there was no significant difference in computation time despite the difference in the complexity of the algorithms (P = .71). CONCLUSIONS The GPU-accelerated volume subdivision technique was as accurate as the B-spline technique and required significantly less processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice.
international conference on computer communications and networks | 2009
George F. Zaki; Hany M. Elsayed; Hassanein H. Amer; Magdy S. El-Soudani
Ubiquitous computing is increasingly introduced in our daily life. An emerging application is environmental monitoring in urban areas. Data gathering in such wireless sensor networks can be performed by using uncontrolled mobile sinks besides the fixed sinks in order to reduce the transmission energy. This method can be highly inefficient as the notification of presence by the mobile sink is not guaranteed. An efficient hybrid method for message relaying and load balancing is proposed in low mobility wireless sensor networks. The system uses either a single hop transmission to a nearby mobile sink or a multi-hop transmission to a far-away fixed node depending on the predicted sink mobility pattern. Taking a mathematical approach, the system parameters are adjusted so that all the sensor nodes dissipate the same amount of energy. Consequently, the problem of losing connectivity due to the fast power drainage of the closest node to the fixed sink is resolved. Numerical results showed that the system lifetime outperforms classical methods of message gathering.
signal processing systems | 2017
George F. Zaki; William Plishker; Shuvra S. Bhattacharyya; Frank Fruth
The dynamic nature of state-of-the-art multicore signal processing systems limits the ability of designers to derive accurate models for the targeted applications. Inaccurate assumptions in the model can lead to inefficient implementations and restrict the runtime re-configuration of these systems. On the other hand, dataflow models have provided powerful techniques to analyze and explore the design space for many classes of signal processing systems. In this context, we develop the Partial Expansion Graph (PEG) as an implementation model where existing dataflow graph analysis is augmented with dynamic adaptation, efficient parallelism utilization, and online reconfiguration based on the measured performance of the targeted applications. In this paper, we develop new methods for scheduling and mapping DSP systems using PEGs. Collectively, these methods tune the amount of data parallelism in the application graph and distribute data- and task-parallel instances over different cores while balancing the load across the available processing units. We enable online adaptation for PEG systems using low-overhead customizable solutions. We demonstrate the utility of our PEG-based scheduling and mapping algorithms through experiments on real application models and various synthetic graphs.
signal processing systems | 2011
George F. Zaki; William Plishker; Shuvra S. Bhattacharyya; Charles Clancy; John Kuykendall
A variety of multiprocessor architectures have proliferated even for off-the-shelf computing platforms. To improve performance and productivity for common heterogeneous systems, we have developed a workflow to generate efficient solutions. By starting with a formal description of an application and the mapping problem we are able to generate a range of designs that efficiently trade-of latency and throughput. In this approach, efficient utilization of SIMD cores is achieved by applying extensive block processing in conjunction with efficient mapping and scheduling. We demonstrate our approach through an integration into the GNU Radio environment for software defined radio system design.
IEEE Journal of Translational Engineering in Health and Medicine | 2016
George F. Zaki; William Plishker; Wen Li; Junghoon Lee; Harry Quon; John Wong; Raj Shekhar
The images generated during radiation oncology treatments provide a valuable resource to conduct analysis for personalized therapy, outcomes prediction, and treatment margin optimization. Deformable image registration (DIR) is an essential tool in analyzing these images. We are enhancing and examining DIR with the contributions of this paper: 1) implementing and investigating a cloud and graphic processing unit (GPU) accelerated DIR solution and 2) assessing the accuracy and flexibility of that solution on planning computed tomography (CT) with cone-beam CT (CBCT). Registering planning CTs and CBCTs aids in monitoring tumors, tracking body changes, and assuring that the treatment is executed as planned. This provides significant information not only on the level of a single patient, but also for an oncology department. However, traditional methods for DIR are usually time-consuming, and manual intervention is sometimes required even for a single registration. In this paper, we present a cloud-based solution in order to increase the data analysis throughput, so that treatment tracking results may be delivered at the time of care. We assess our solution in terms of accuracy and flexibility compared with a commercial tool registering CT with CBCT. The latency of a previously reported mutual information-based DIR algorithm was improved with GPUs for a single registration. This registration consists of rigid registration followed by volume subdivision-based nonrigid registration. In this paper, the throughput of the system was accelerated on the cloud for hundreds of data analysis pairs. Nine clinical cases of head and neck cancer patients were utilized to quantitatively evaluate the accuracy and throughput. Target registration error (TRE) and structural similarity index were utilized as evaluation metrics for registration accuracy. The total computation time consisting of preprocessing the data, running the registration, and analyzing the results was used to evaluate the system throughput. Evaluation showed that the average TRE for GPU-accelerated DIR for each of the nine patients was from 1.99 to 3.39 mm, which is lower than the voxel dimension. The total processing time for 282 pairs on an Amazon Web Services cloud consisting of 20 GPU enabled nodes took less than an hour. Beyond the original registration, the cloud resources also included automatic registration quality checks with minimal impact to timing. Clinical data were utilized in quantitative evaluations, and the results showed that the presented method holds great potential for many high-impact clinical applications in radiation oncology, including adaptive radio therapy, patient outcomes prediction, and treatment margin optimization.The images generated during radiation oncology treatments provide a valuable resource to conduct analysis for personalized therapy, outcomes prediction, and treatment margin optimization. Deformable image registration (DIR) is an essential tool in analyzing these images. We are enhancing and examining DIR with the contributions of this paper: 1) implementing and investigating a cloud and graphic processing unit (GPU) accelerated DIR solution and 2) assessing the accuracy and flexibility of that solution on planning computed tomography (CT) with cone-beam CT (CBCT). Registering planning CTs and CBCTs aids in monitoring tumors, tracking body changes, and assuring that the treatment is executed as planned. This provides significant information not only on the level of a single patient, but also for an oncology department. However, traditional methods for DIR are usually time-consuming, and manual intervention is sometimes required even for a single registration. In this paper, we present a cloud-based solution in order to increase the data analysis throughput, so that treatment tracking results may be delivered at the time of care. We assess our solution in terms of accuracy and flexibility compared with a commercial tool registering CT with CBCT. The latency of a previously reported mutual information-based DIR algorithm was improved with GPUs for a single registration. This registration consists of rigid registration followed by volume subdivision-based nonrigid registration. In this paper, the throughput of the system was accelerated on the cloud for hundreds of data analysis pairs. Nine clinical cases of head and neck cancer patients were utilized to quantitatively evaluate the accuracy and throughput. Target registration error (TRE) and structural similarity index were utilized as evaluation metrics for registration accuracy. The total computation time consisting of preprocessing the data, running the registration, and analyzing the results was used to evaluate the system throughput. Evaluation showed that the average TRE for GPU-accelerated DIR for each of the nine patients was from 1.99 to 3.39 mm, which is lower than the voxel dimension. The total processing time for 282 pairs on an Amazon Web Services cloud consisting of 20 GPU enabled nodes took less than an hour. Beyond the original registration, the cloud resources also included automatic registration quality checks with minimal impact to timing. Clinical data were utilized in quantitative evaluations, and the results showed that the presented method holds great potential for many high-impact clinical applications in radiation oncology, including adaptive radio therapy, patient outcomes prediction, and treatment margin optimization.
application specific systems architectures and processors | 2012
George F. Zaki; William Plishker; Shuvra S. Bhattacharyya; Frank Fruth
Emerging Digital Signal Processing (DSP) algorithms and wireless communications protocols require dynamic adaptation and online reconfiguration for the implemented systems at runtime. In this paper, we introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques,depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. Empirical results show that the PEG strategy can 1) achieve significant speedups on a state of the art multicore signal processor platform for static dataflow applications with predictable execution times,and 2) exceed classical scheduling speedups for application shaving execution times that can vary dynamically. This ability to handle variable execution times is especially useful as DSP applications and platforms increase in complexity and adaptive behavior, thereby reducing execution time predictability.
Journal of medical imaging | 2016
Xinyang Liu; Sukryool Kang; William Plishker; George F. Zaki; Timothy D. Kane; Raj Shekhar
Abstract. The purpose of this work was to develop a clinically viable laparoscopic augmented reality (AR) system employing stereoscopic (3-D) vision, laparoscopic ultrasound (LUS), and electromagnetic (EM) tracking to achieve image registration. We investigated clinically feasible solutions to mount the EM sensors on the 3-D laparoscope and the LUS probe. This led to a solution of integrating an externally attached EM sensor near the imaging tip of the LUS probe, only slightly increasing the overall diameter of the probe. Likewise, a solution for mounting an EM sensor on the handle of the 3-D laparoscope was proposed. The spatial image-to-video registration accuracy of the AR system was measured to be 2.59±0.58 mm and 2.43±0.48 mm for the left- and right-eye channels, respectively. The AR system contributed 58-ms latency to stereoscopic visualization. We further performed an animal experiment to demonstrate the use of the system as a visualization approach for laparoscopic procedures. In conclusion, we have developed an integrated, compact, and EM tracking-based stereoscopic AR visualization system, which has the potential for clinical use. The system has been demonstrated to achieve clinically acceptable accuracy and latency. This work is a critical step toward clinical translation of AR visualization for laparoscopic procedures.
World Congress on Medical Physics and Biomedical Engineering, 2015 | 2015
Seyoun Park; William Plishker; Adam Robinson; George F. Zaki; Raj Shekhar; T.R. McNutt; Harry Quon; John Wong; Junghoon Lee
A critical requirement of successful adaptive radiotherapy (ART) is the knowledge of anatomical changes as well as actual dose delivered to the patient during the course of treatment. While cone-beam CT (CBCT) is typically used to minimize the patient setup error and monitor daily anatomical changes, its poor image quality impedes accurate segmentation of the target structures and the dose computation. We developed an integrated ART software platform that combines fast and accurate image registration, segmentation, and dose computation/ accumulation methods. The developed platform automatically links patient images, radiotherapy plan, beam and dosimetric parameters, and daily treatment information, thus providing and efficient ART workflow. Furthermore, to improve the accuracy of deformable image registration (DIR) between the planning CT and daily CBCTs, we iteratively correct CBCT intensities by matching local intensity histograms in conjunction with the DIR process. We tested our DIR method on six head and neck (HN) cancer cases, producing improved registration quality. Our method produced overall NMI of 0.663 and NCC of 0.987, outperforming conventional methods by 3.8% and 1.9%, respectively. The overall ART process has been validated on two HN cancer cases, showing differences between the planned and the actually delivered dose values. Both DIR and dose computation modules are accelerated by GPUs, and the computation time for DIR and dose computation at each fraction is 1min.