Richard Platania | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richard Platania is active.

Explore More

Publication

Featured researches published by Richard Platania.

international conference on bioinformatics | 2017

Automated Breast Cancer Diagnosis Using Deep Learning and Region of Interest Detection (BC-DROID)

Richard Platania; Shayan Shams; Seungwon Yang; Jian Zhang; Kisung Lee; Seung-Jong Park

Detection of suspicious regions in mammogram images and the subsequent diagnosis of these regions remains a challenging problem in the medical world. There still exists an alarming rate of misdiagnosis of breast cancer. This results in both over treatment through incorrect positive diagnosis of cancer and under treatment through overlooked cancerous masses. Convolutional neural networks have shown strong applicability to various image datasets, enabling detailed features to be learned from the data and, as a result, the ability to classify these images at extremely low error rates. In order to overcome the difficulty in diagnosing breast cancer from mammogram images, we propose our framework for automated breast cancer detection and diagnosis, called BC-DROID, which provides automated region of interest detection and diagnosis using convolutional neural networks. BC-DROID first pretrains based on physician-defined regions of interest in mammogram images. It then trains based on the full mammogram image. The resulting network is able to detect and classify regions of interest as cancerous or benign in one step. We demonstrate the accuracy of our frameworks ability to both locate the regions of interest as well as diagnose them. Our framework achieves a detection accuracy of up to 90% and a classification accuracy of 93.5% (AUC of 92.315%). To the best of our knowledge, this is the first work enabling both automated detection and diagnosis of these areas in one step from full mammogram images. Using our frameworks website, a user can upload a single mammogram image, visualize suspicious regions, and receive the automated diagnoses of these regions.

Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale | 2016

BIC-LSU: Big Data Research Integration with Cyberinfrastructure for LSU

Chui-Hui Chiu; Nathan Lewis; Dipak Kumar Singh; Arghya Kusum Das; Mohammad M. Jalazai; Richard Platania; Sayan Goswami; Kisung Lee; Seung-Jong Park

In recent years, big data analysis has been widely applied to many research fields including biology, physics, transportation, and material science. Even though the demands for big data migration and big data analysis are dramatically increasing in campus IT infrastructures, there are several technical challenges that need to be addressed. First of all, frequent big data transmission between storage systems in different research groups imposes heavy burdens on a regular campus network. Second, the current campus IT infrastructure is not designed to fully utilize the hardware capacity for big data migration and analysis. Last but not the least, running big data applications on top of large-scale high-performance computing facilities is not straightforward, especially for researchers and engineers in non-IT disciplines. We develop a campus IT cyberinfrastructure for big data migration and analysis, called BIC-LSU, which consists of a task-aware Clos OpenFlow network, high-performance cache storage servers, customized high-performance transfer applications, a light-weight control framework to manipulate existing big data storage systems and job scheduling systems, and a comprehensive social networking-enabled web portal. BIC-LSU achieves 40Gb/s disk-to-disk big data transmission, maintains short average transmission task completion time, enables the convergence of control on commonly deployed storage and job scheduling systems, and enhances easiness of big data analysis with a universal user-friendly interface. BIC-LSU software requires minimum dependencies and has high extensibility. Other research institutes can easily customize and deploy BIC-LSU as an augmented service on their existing IT infrastructures.

advanced information networking and applications | 2015

Enabling Large-Scale Biomolecular Conformation Search with Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD) over HPC and Cloud Computing Resources

Nayong Kim; Richard Platania; Wei Huang; Chris Knight; T. Keyes; Seung-Jong Park; Joohyun Kim

We present the latest development and experimental simulation studies of Statistical Temperature Molecular Dynamics (STMD) and its parallel tempering version, Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD). Our main contributions are i) introduction of newly implemented STMD in LAMMPS, ii) use of large scale distributed cyber infrastructure including Amazon EC2 and the nationwide distributed computing infrastructure GENI, in addition to High-performance Computing (HPC) cluster systems, and iii) benchmark and simulation results highlighting advantages and potentials of STMD and RESTMD for challenging large-scale bio molecular conformational search. In this work, we attempt to provide convincing evidence that RESTMD, combining two advanced sampling protocols, STMD and the replica exchange algorithm, offers various advantages over not only conventional ineffective approaches but also other enhanced sampling methods. Interestingly, RESTMD has benefits over the most popular Replica Exchange Molecular Dynamics (REMD) as an application maximizing its capacity in HPC environments. For example, RESTMD alleviates the need of a large number of replicas which is unavoidable in REMD and is flexible in order to exploit the maximum amount of available computing power of a cluster system. Continuing our recent effort in which RESTMD was implemented with a community molecular dynamics package, CHARMM, and the Hadoop MapReduce, in this work, we report latest development outcomes. First of all, we plugged the implementation of STMD into LAMMPS, one of the most popular public molecular dynamics packages. This is expected to position STMD and RESTMD appealing to investigators from a broad range of life science fields. Secondly, Hadoop MapReduce-based RESTMD is now able to run on Amazon EC2 and the nationwide network virtual organization, the GENI distributed computing environment. Thirdly, in order to find optimized parameters for RESTMD simulations, simulation results using test systems, water and solvated cram bin, were obtained and presented. These results, despite of relatively small sizes and short time scale trajectories, serve to underscore merits and potentials of STMD and RESTMD with respect to the strength in algorithmic advantages as well as efficient utilization of distributed resources.

IWSG '14 Proceedings of the 2014 6th International Workshop on Science Gateways | 2014

MapReduce-Based RESTMD: Enabling Large-Scale Sampling Tasks with Distributed HPC Systems

Praveenkumar Kondikoppa; Richard Platania; Seung-Jong Park; T. Keyes; Jaegil Kim; Nayong Kim; Joohyun Kim; Shuju Bai

A novel implementation of Replica Exchange Statistical Temperature Molecular Dynamics (RESTMD), belonging to a generalized ensemble method and also known as parallel tempering, is presented. Our implementation employs the MapReduce (MR)-based iterative framework for launching RESTMD over high performance computing (HPC) clusters including our test bed system, Cyber-infrastructure for Reconfigurable Optical Networks (CRON) simulating a network-connected distributed system. Our main contribution is a new implementation of STMD plugged into the well-known CHARMM molecular dynamics package as well as the RESTMD implementation powered by the Hadoop that scales out in a cluster and across distributed systems effectively. To address challenges for the use of Hadoop MapReduce, we examined contributing factors on the performance of the proposed framework with various runtime analysis experiments with two biological systems that differ in size and over different types of HPC resources. Many advantages with the use of RESTMD suggest its effectiveness for enhanced sampling, one of grand challenges in a variety of areas of studies ranging from chemical systems to statistical inference. Lastly, with its support for scale-across capacity over distributed computing infrastructure (DCI) and the use of Hadoop for coarse-grained task-level parallelism, MapReduce-based RESTMD represents truly a good example of the next-generation of applications whose provision is increasingly becoming demanded by science gateway projects, in particular, backed by IaaS clouds.

Journal of Bioinformatics and Computational Biology | 2017

Large-scale parallel genome assembler over cloud computing environment

Arghya Kusum Das; Praveen Kumar Koppa; Sayan Goswami; Richard Platania; Seung-Jong Park

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

Concurrency and Computation: Practice and Experience | 2017

Hadoop-based replica exchange over heterogeneous distributed cyberinfrastructures: HARE: HADOOP-BASED REPLICA EXCHANGE

Richard Platania; Shayan Shams; Chui-Hui Chiu; Nayong Kim; Joohyun Kim; Seung-Jong Park

We present Hadoop‐based replica exchange (HaRE), a Hadoop‐based implementation of the replica exchange scheme developed primarily for replica exchange statistical temperature molecular dynamics, an example of a large‐scale, advanced sampling molecular dynamics simulation. By using Hadoop as a framework and the MapReduce model for driving replica exchange, an efficient task‐level parallelism is introduced to replica exchange statistical temperature molecular dynamics simulations. In order to demonstrate this, we investigate the performance of our application over various distributed cyberinfrastructures (DCI), including several high‐performance computing systems, our cyberinfrastructure for reconfigurable optical networks testbed, the global environment for network innovations testbed, and the CloudLab testbed. Scalability performance analysis is shown in terms of scale‐out and scale‐up over a single high‐performance computing cluster, EC2, and CloudLab and scale‐across with cyberinfrastructure for reconfigurable optical networks and global environment for network innovations. As a result, we demonstrate that HaRE is capable of efficient execution over both homogeneous and heterogeneous DCI of varying size and configuration. Contributing factors to performance are discussed in order to provide insight towards the effects of computing environment on the execution of HaRE. With these contributions, we propose that similar loosely coupled scientific applications can also take advantage of the scalable, task‐level parallelism Hadoop MapReduce provides over various DCI. Copyright

medical image computing and computer assisted intervention | 2018

Deep Generative Breast Cancer Screening and Diagnosis

Shayan Shams; Richard Platania; Jian Zhang; Joohyun Kim; Seung-Jong Park

Mammography is the primary modality for breast cancer screening, attempting to reduce breast cancer mortality risk with early detection. However, robust screening less hampered by misdiagnoses remains a challenge. Deep Learning methods have shown strong applicability to various medical image datasets, primarily thanks to their powerful feature learning capability. Such successful applications are, however, often overshadowed with limitations in real medical settings, dependency of lesion annotations, and discrepancy of data types between training and other datasets. To address such critical challenges, we developed DiaGRAM (Deep GeneRAtive Multi-task), which is built upon the combination of Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN). The enhanced feature learning with GAN, and its incorporation with the hybrid training with the region of interest (ROI) and the whole images results in higher classification performance and an effective end-to-end scheme. DiaGRAM is capable of robust prediction, even for a small dataset, without lesion annotation, via transfer learning capacity. DiaGRAM achieves an AUC of 88.4% for DDSM and even 92.5% for the challenging INbreast with its small data size.

international conference on bioinformatics | 2018

A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional Networks

Shayan Shams; Richard Platania; Joohyun Kim; Jian Zhang; Kisung Lee; Seungwon Yang; Seung-Jong Park

A deep learning approach for analyzing DNase-seq datasets is presented, which has promising potentials for unraveling biological underpinnings on transcription regulation mechanisms. Further understanding of these mechanisms can lead to important advances in life sciences in general and drug, biomarker discovery, and cancer research in particular. Motivated by recent remarkable advances in the field of deep learning, we developed a platform, Deep Semi-Supervised DNase-seq Analytics (DSSDA). Primarily empowered by deep generative Convolutional Networks (ConvNets), the most notable aspect is the capability of semi-supervised learning, which is highly beneficial for common biological settings often plagued with a less sufficient number of labeled data. In addition, we investigated a k-mer based continuous vector space representation, attempting further improvement on learning power with the consideration of the nature of biological sequences for features associated with locality-based relationships between neighboring nucleotides. DSSDA employs a modified Ladder Network for underlying generative model architecture, and its performance is demonstrated on the cell type classification task using sequences from large-scale DNase-seq experiments. We report the performance of DSSDA in both fully-supervised setting, in which DSSDA outperforms widely-known ConvNet models (94.6% classification accuracy), and semi-supervised setting for which, even with less than 10% of labeled data, DSSDA performs relatively comparable to other ConvNets using the full data set. Our results underscore, in order to deal with challenging genomic sequence datasets, the need of a better deep learning method to learn latent features and representation.

international conference on cloud computing | 2017

Augmenting Amdahl's Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science

Arghya Kusum Das; Jae-Ki Hong; Sayan Goswami; Richard Platania; Kisung Lee; Wooseok Chang; Seung-Jong Park; Ling Liu

High-performance analysis of big data demands more computing resources, forcing similar growth in computation cost. So, the challenge to the HPC system designers is providing not only high performance but also high performance at lower cost. For high performance yet cost effective cyberinfrastructure, we propose a new system model augmenting Amdahls second law for balanced system to optimize price-performance-ratio. We express the optimal balance among CPU-speed, I/O-bandwidth and DRAM-size (i.e., Amdahls I/O-and memory-number) in terms of application characteristics and hardware cost. Considering Xeon processor and recent hardware prices, we showed that a system needs almost 0.17GBPS I/O-bandwidth and 3GB DRAM per GHz CPU-speed to minimize the price-performance-ratio for data-and compute-intensive applications. To substantiate our claim, we evaluate three different cluster architectures: 1) SupermikeII, a traditional HPC cluster, 2) SwatIII, a regular datacenter, and 3) CeresII, a MicroBrick-based novel hyperscale system. CeresII with 6-Xeon-D1541 cores (2GHz/core), 1-NVMe SSD (2GBPS I/O-bandwidth) and 64GB DRAM per node, closely resembles the optimum produced by our model. Consequently, in terms of price-performance-ratio CeresII outperformed both SupermikeII (by 65-85%) and SwatIII (by 40-50%) for data-and compute-intensive Hadoop benchmarks (TeraSort and WordCount) and our own benchmark genome assembler based on Hadoop and Giraph.

international conference on big data | 2016

Lazer: Distributed memory-efficient assembly of large-scale genomes

Sayan Goswami; Arghya Kusum Das; Richard Platania; Kisung Lee; Seung-Jong Park

Genome sequencing technology has witnessed tremendous progress in terms of throughput as well as cost per base pair, resulting in an explosion in the size of data. Consequently, typical sequence assembly tools demand a lot of processing power and memory and are unable to assemble big datasets unless run on hundreds of nodes. In this paper, we present a distributed assembler that achieves both scalability and memory efficiency by using partitioned de Bruijn graphs. By enhancing the memory-to-disk swapping and reducing the network communication in the cluster, we can assemble large sequences such as human genomes (452 GB) on just two nodes in 14.5 hours, and also scale up to 128 nodes in 23 minutes. We also assemble a synthetic wheat genome with 1.1 TB of raw reads on 8 nodes in 18.5 hours and on 128 nodes in 1.25 hours.

Explore More