Meilian Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Meilian Xu is active.

Explore More

Publication

Featured researches published by Meilian Xu.

international conference on parallel processing | 2007

Image Reconstruction using Microwave Tomography for Breast Cancer Detection on Distributed Memory Machine

Meilian Xu; Abas Sabouni; Parimala Thulasiraman; Sima Noghanian; Stephen Pistorius

Breast cancer, with the exception of lung cancer, is the leading cause of cancer deaths in women. It is also one of the few cancers that can be controlled by using asymptomatic screening method, followed by effective treatments. One recent screening modality under development, microwave tomography, uses the apparent dielectric property contrasts between different breast tissues at microwave frequencies. Microwave tomography uses a numerical model and the image reconstruction consists of iteratively searching the breast structures, applying the numerical model to the breast structures, and matching measured data with computation results of the model. This paper focuses on finite-difference time-domain (FDTD) for the numerical model and genetic algorithm (GA) for the iterative searches. FDTD and GA are time-consuming, yet they are data parallel in nature. In this paper, a parallel algorithm integrating GA and FDTD for detecting tumors using microwave tomography technique is presented. The algorithm is implemented on distributed memory machines using MPI.

Journal of Parallel and Distributed Computing | 2012

Microwave tomography for breast cancer detection on Cell broadband engine processors

Meilian Xu; Parimala Thulasiraman; Sima Noghanian

Microwave tomography (MT) is a safe screening modality that can be used for breast cancer detection. The technique uses the dielectric property contrasts between different breast tissues at microwave frequencies to determine the existence of abnormalities. Our proposed MT approach is an iterative process that involves two algorithms: Finite-Difference Time-Domain (FDTD) and Genetic Algorithm (GA). It is a compute intensive problem: (i) the number of iterations can be quite large to detect small tumors; (ii) many fine-grained computations and discretizations of the object under screening are required for accuracy. In our earlier work, we developed a parallel algorithm for microwave tomography on CPU-based homogeneous, multi-core, distributed memory machines. The performance improvement was limited due to communication and synchronization latencies inherent in the algorithm. In this paper, we exploit the parallelism of microwave tomography on the Cell BE processor. Since FDTD is a numerical technique with regular memory accesses, intensive floating point operations and SIMD type operations, the algorithm can be efficiently mapped on the Cell processor achieving significant performance. The initial implementation of FDTD on Cell BE with 8 SPEs is 2.9 times faster than an eight node shared memory machine and 1.45 times faster than an eight node distributed memory machine. In this work, we modify the FDTD algorithm by overlapping computations with communications during asynchronous DMA transfers. The modified algorithm also orchestrates the computations to fully use data between DMA transfers to increase the computation-to-communication ratio. We see 54% improvement on 8 SPEs (27.9% on 1 SPE) for the modified FDTD in comparison to our original FDTD algorithm on Cell BE. We further reduce the synchronization latency between GA and FDTD by using mechanisms such as double buffering. We also propose a performance prediction model based on DMA transfers, number of instructions and operations, the processor frequency and DMA bandwidth. We show that the execution time from our prediction model is comparable (within 0.5 s difference) with the execution time of the experimental results on one SPE.

high performance computing and communications | 2008

Parallel Algorithm Design and Performance Evaluation of FDTD on 3 Different Architectures: Cluster, Homogeneous Multicore and Cell/B.E.

Meilian Xu; Parimala Thulasiraman

Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is difficult to meet with the increasing high performance requirements of diversified applications at different levels for general purpose computing. A promising feasible solution is the novice multi-core systems which extend the parallelism to CPU level by integrating multiple processing units on a single die. This paper uses finite-difference time-domain (FDTD) algorithm as a case study, designing suitable parallel FDTD algorithms for three architectures: distributed-memory machines with single-core processors, shared-memory machines with dual-core processors, and the Cell Broadband Engine (Cell/B.E.) processor with nine heterogeneous cores. The experiment results show that the Cell/B.E. processor using 8 SPEs achieves a significant speedups of 7.05 faster than AMD single-core Opteron processor and 3.37 than AMD dual-core Opeteron processor at the processor level.

ieee antennas and propagation society international symposium | 2007

Efficient microwave breast imaging technique using parallel finite difference time domain and parallel genetic algorithms

Abas Sabouni; Meilian Xu; Sima Noghanian; Parimala Thulasiraman; Stephen Pistorius

This paper addresses the nonlinear tomographic image reconstruction problem with particular emphasis on developing efficient numerical algorithms for early breast cancer detection using parallel algorithms to enhance both the speed and quality of the recovered images. Our goal is to illustrate an effective method of microwave imaging for early stage breast cancer detection, using parallel finite-difference time domain method (PFDTD) and parallel genetic algorithms (PGAs) optimization, by using message passing interface (MPI) library.

computational science and engineering | 2008

HPC for iterative image reconstruction in CT

Cameron Melvin; Meilian Xu; Parimala Thulasiraman

Algebraic Reconstruction Techniques (ART) for computed tomography (CT) have proven to produce better images with fewer projections, hence, reducing the side-effects of the carcinogenic nature of X-ray imaging. However, the iterative nature of ART prohibits its commercial use because of the long processing time. Parallel processing through high performance computers (HPC) is one solution to speedup ART algorithm. The work discussed in the literature on parallel computing and CT primarily focuses on the algorithms based on Fourier techniques, with a lack of development of parallel approaches for ART techniques. The main reason for this has been the extensive computational requirements needed for this algorithm. With the boom in information technology and advanced architectures, we show in this paper that the ART algorithm can be parallelized on high performance computers, with significant performance gain while maintaining the image quality. In this paper, we examine the efficiency of ART on a shared memory machine available on the Western Canada Research Grid consortium without impeding image quality. We show that a 6 processor IBM P-server could reconstruct the same image from 36 angles in approximately 5.038 seconds (36 processors is 1.183 seconds), with an efficiency of 93.35%. In other words, a parallel algorithm reconstruction could be done in about the same amount of time as a 180 angle sequential Fourier back projection reconstruction, yielding approximately equivalent image quality, with an 80% reduction in dose.

International Journal of Biomedical Imaging | 2011

Mapping iterative medical imaging algorithm on cell accelerator

Meilian Xu; Parimala Thulasiraman

Algebraic reconstruction techniques require about half the number of projections as that of Fourier backprojection methods, which makes these methods safer in terms of required radiation dose. Algebraic reconstruction technique (ART) and its variant OS-SART (ordered subset simultaneous ART) are techniques that provide faster convergence with comparatively good image quality. However, the prohibitively long processing time of these techniques prevents their adoption in commercial CT machines. Parallel computing is one solution to this problem. With the advent of heterogeneous multicore architectures that exploit data parallel applications, medical imaging algorithms such as OS-SART can be studied to produce increased performance. In this paper, we map OS-SART on cell broadband engine (Cell BE). We effectively use the architectural features of Cell BE to provide an efficient mapping. The Cell BE consists of one powerPC processor element (PPE) and eight SIMD coprocessors known as synergetic processor elements (SPEs). The limited memory storage on each of the SPEs makes the mapping challenging. Therefore, we present optimization techniques to efficiently map the algorithm on the Cell BE for improved performance over CPU version. We compare the performance of our proposed algorithm on Cell BE to that of Sun Fire ×4600, a shared memory machine. The Cell BE is five times faster than AMD Opteron dual-core processor. The speedup of the algorithm on Cell BE increases with the increase in the number of SPEs. We also experiment with various parameters, such as number of subsets, number of processing elements, and number of DMA transfers between main memory and local memory, that impact the performance of the algorithm.

international parallel and distributed processing symposium | 2008

Finite-difference time-domain on the cell/B.E. processor

Meilian Xu; Parimala Thulasiraman

Finite-Difference Time-Domain (FDTD) is a kernel used to solve problems in electromagnetics applications such as microwave tomography. It is a data-intensive and computation-intensive problem. However, its computation scheme indicates that an architecture with SIMD support has the potential to bring performance improvement over traditional architectures without SIMD support. The Cell Broadband Engine (Cell/B.E.) processor is an implementation of a heterogeneous multicore architecture. It consists of one conventional microprocessor, PowerPC Processor Element (PPE), and eight SIMD co-processor elements, Synergistic Processor Elements (SPEs). One unique feature of an SPE is that it has 128-entry 128-bit uniform registers which support SIMD. Therefore, FDTD may be mapped well on Cell/B.E. processor. However, each SPE can directly access only 256KB local store (LS) both for instructions and data. The size ofLS is much less than what is needed for an accurate simulation of FDTD which requires large number of fine-grained Yee cells. In this paper, we design the algorithm on Cell/B.E. by efficiently using the asynchronous DMA (direct memory access) mechanism available on an SPE transferring data between its LS and the main memory via the high bandwidth bus on-chip EIB (Element Interconnect Bus). The new algorithm was run on an IBM Blade QS20 blades running at 3.2GHz. For a computation domain of 600 x 600 Yee cells, we achieve an overall speedup of 14.14 over AMD Athlon and 7.05 over AMD Opteron at the processor level.

international parallel and distributed processing symposium | 2007

A Parallel Algorithmic Approach for Microwave Tomography in Breast Cancer Detection

Meilian Xu; Abas Sabouni; Parimala Thulasiraman; Sima Noghanian; Stephen Pistorius

Different technologies have been used for breast cancer detections clinically. But they have weaknesses in terms of sensitivity and specificity. Microwave imaging technique, on the contrary, uses the apparent dielectric property contrasts between different breast tissues at microwave frequencies and is a prospective direction to find small tumor at their early stage. Microwave tomography falls in one category of microwave imaging technique. There are two main components in microwave tomography to detect abnormalities in breasts: genetic algorithm (GA) and finite-difference time-domain (FDTD). Both GA and FDTD are time-consuming, but, they are data-parallel in nature. In this paper, we have designed a parallel framework for microwave tomography: parallel GA combined with parallel FDTD. The algorithms are implemented on distributed memory machines running MPI. The execution time of the sequential algorithm (GA and FDTD combined) is 10,131 seconds. The total execution time obtained on 16 processors which is approximately 2000 seconds surpasses the sequential algorithm.

high performance computing systems and applications | 2008