Shiyong Lu
Wayne State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shiyong Lu.
grid computing environments | 2008
Ian T. Foster; Yong Zhao; Ioan Raicu; Shiyong Lu
Cloud computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for cloud computing and there seems to be no consensus on what a cloud is. On the other hand, cloud computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established grid computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. This paper strives to compare and contrast cloud computing with grid computing from various angles and give insights into the essential characteristics of both.
BMC Bioinformatics | 2004
Yi Lu; Shiyong Lu; Farshad Fotouhi; Youping Deng; Susan J. Brown
BackgroundIn recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data.ResultsIn this paper, we propose a new clustering algorithm, Incremental Genetic K-means Algorithm (IGKA). IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (FGKA). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at http://database.cs.wayne.edu/proj/FGKA/index.htm.ConclusionsOur experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster.
acm symposium on applied computing | 2004
Yi Lu; Shiyong Lu; Farshad Fotouhi; Youping Deng; Susan J. Brown
In this paper, we propose a new clustering algorithm called Fast Genetic K-means Algorithm (FGKA). FGKA is inspired by the Genetic K-means Algorithm (GKA) proposed by Krishna and Murty in 1999 but features several improvements over GKA. Our experiments indicate that, while K-means algorithm might converge to a local optimum, both FGKA and GKA always converge to the global optimum eventually but FGKA runs much faster than GKA.
IEEE Transactions on Services Computing | 2009
Cui Lin; Shiyong Lu; Xubo Fei; Artem Chebotko; Darshan Pai; Zhaoqiang Lai; Farshad Fotouhi; Jing Hua
Scientific workflows have recently emerged as a new paradigm for scientists to formalize and structure complex and distributed scientific processes to enable and accelerate many scientific discoveries. In contrast to business workflows, which are typically control flow oriented, scientific workflows tend to be dataflow oriented, introducing a new set of requirements for system development. These requirements demand a new architectural design for scientific workflow management systems (SWFMSs). Although several SWFMSs have been developed that provide much experience for future research and development, a study from an architectural perspective is still missing. The main contributions of this paper are: 1) based on a comprehensive survey of the literature and identification of key requirements for SWFMSs, we propose the first reference architecture for SWFMSs; 2) according to the reference architecture, we further propose a service-oriented architecture for View (a VIsual sciEntific Workflow management system); 3) we implemented View to validate the feasibility of the proposed architectures; and 4) we present a View-based scientific workflow application system (SWFAS), called FiberFlow, to showcase the application of our View system.
data and knowledge engineering | 2009
Artem Chebotko; Shiyong Lu; Farshad Fotouhi
Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness.
international conference on cloud computing | 2011
Cui Lin; Shiyong Lu
Most existing workflow scheduling algorithms only consider a computing environment in which the number of compute resources is bounded. Compute resources in such an environment usually cannot be provisioned or released on demand of the size of a workflow, and these resources are not released to the environment until an execution of the workflow completes. To address the problem, we firstly formalize a model of a Cloud environment and a workflow graph representation for such an environment. Then, we propose the SHEFT workflow scheduling algorithm to schedule a workflow elastically on a Cloud computing environment. Our preliminary experiments show that SHEFT not only outperforms several representative workflow scheduling algorithms in optimizing workflow execution time, but also enables resources to scale elastically at runtime.
ieee international conference on services computing | 2008
Cui Lin; Shiyong Lu; Zhaoqiang Lai; Artem Chebotko; Xubo Fei; Jing Hua; Farshad Fotouhi
Scientific workflows have recently emerged as a new paradigm for scientists to formalize and structure complex and distributed scientific processes to enable and accelerate many scientific discoveries. In contrast to business workflows, which are typically control flow oriented, scientific workflows tend to be dataflow oriented, introducing a new set of requirements for system development. These requirements demand a new architectural design for scientific workflow management systems (SWFMSs). Although several SWFMSs have been developed that provide much experience for future research and development, a study from an architectural perspective is still missing. The main contributions of this paper are: (i) based on a comprehensive survey of the literature and identification of key requirements for SWFMSs, we propose the first reference architecture for SWFMSs, (ii) in compliance with the reference architecture, we further propose a service-oriented architecture for VIEW (a VIsual sciEntificWorkflow management system), (iii) we implement VIEW to validate the feasibility of the proposed architectures, and (iv) we present two case studies to showcase the applications of our VIEW system.
Information Systems | 2007
Mustafa Atay; Artem Chebotko; Dapeng Liu; Shiyong Lu; Farshad Fotouhi
Storing and querying XML documents using a RDBMS is a challenging problem since one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: schema mapping, data mapping and query mapping. In this paper, we propose: (i) a lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, (ii) two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data. To our best knowledge, there is no published linear schema-based data mapping algorithm for mapping ordered XML data to relational data. Experimental results are presented to show that our algorithms are efficient and scalable.
IEEE Transactions on Parallel and Distributed Systems | 2011
Dharma Teja Nukarapu; Bin Tang; Liqiang Wang; Shiyong Lu
Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file caching technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.
modeling analysis and simulation on computer and telecommunication systems | 1999
Shiyong Lu; Scott A. Smolka
We use model checking to establish five essential correctness properties of the secure electronic transaction (SET) protocol. SET has been developed jointly by Visa and MasterCard as a method to secure payment card transactions over open networks, and industrial interest in the protocol is high. Our main contributions are to firstly create a formal model of the protocol capturing the purchase request, payment authorization, and payment capture transactions. Together these transactions constitute the kernel of the protocol. We then encoded our model and the aforementioned correctness properties in the input language of the FDR model checker. Running FDR on this input established that our model of the SET protocol satisfies all five properties even though the cardholder and merchant, two of the participants in the protocol, may try to behave dishonestly in certain ways. To our knowledge, this is the first attempt to formalize the SET protocol for the purpose of model checking.