Is this you? Create Your Porfile

Wei Ming Lin

University of Texas at San Antonio

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wei Ming Lin is active.

Explore More

Publication

Featured researches published by Wei Ming Lin.

IEEE Transactions on Computers | 1991

Algorithmic mapping of neural network models onto parallel SIMD machines

Wei Ming Lin; Viktor K. Prasanna; K.W. Przytula

Implementations of neural networks on programmable massively parallel computers are addressed. The methods are based on a graph theoretic approach and are applicable to a large class of networks in which the computations can be described by means of matrix and vector operations. A detailed characterization of the target machine is provided. Two mappings are presented. The first is designed for a processor array consisting of a very large number of small processing units. The neurons and the nonzero synaptic weights are assigned to the processors in a predetermined order, one per processor. The data transfers between processors containing neurons and weights are implemented using a novel routing algorithm. The second mapping is designed for the data array of size N*N and a smaller processor array of size P*P, P >

Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing | 1990

Efficient histogramming on hypercube SIMD machines

Wei Ming Lin; V. K. Prasanna Kumar

Abstract This paper considers the histogramming problem on hypercube. N -PE hypercube is used to process an N 12 × N 12 digitized image in which each pixel has a gray-level value between 0 and M − 1. In general, M , the range of gray-level values is much smaller than N , the number of pixels being processed. Our algorithm generates the histogram of the image in O (log M * log N ) time using radix sort and efficient data movement operations. This technique can be implemented on butterfly, shuffle-exchange and fat pyramid organizations.

ieee international conference on cloud computing technology and science | 2014

RMCC: Restful Mobile Cloud Computing Framework for Exploiting Adjacent Service-Based Mobile Cloudlets

Saeid Abolfazli; Zohreh Sanaei; Abdullah Gani; Feng Xia; Wei Ming Lin

Mobile devices, especially smartphones are increasingly gaining ground in several domains, particularly healthcare, tele-monitoring, and education to perform Resource-intensive Mobile Applications (RiMA). However, constrained resources, especially CPU and battery hinder their successful adoption. Mobile Cloud Computing (MCC) aims to augment computational capabilities of resource-constraint mobile devices and conserve their native resources by remotely performing intensive tasks. In typical MCC solutions, intensive tasks are offloaded to distant VM-based cloud data centers or cloudlets whose exploitation originates long WAN latency and/or virtualization overhead degrading RiMA execution efficiency. In this paper, a lightweight Resource-oriented MCC (RMCC) architecture is proposed that exploits resources of plethora of Adjacent Service-based Mobile Cloudlets (ASMobiC) as fine-grained mobile service providers. In RMCC, ASMobiCs host prefabricated Restful services to be asynchronously called by mobile service consumers at runtime. RMCC is a Restful cross-platform architecture functional on major mobile OSs (e.g., Android and iOS) and realizes utilization of the computing resources of off-the-shelve outdated or damaged-yet-functioning mobile devices towards green MCC. Results of benchmarking advocate significant mean time- and energy-saving of 87% and 71.45%, respectively when intensive tasks are executed in ASMobiCs.

IEEE ACM Transactions on Networking | 2009

On designing fast nonuniformly distributed IP address lookup hashing algorithms

Christopher J. Martinez; Devang Pandya; Wei Ming Lin

Computer networks have continued to make substantial advances in the past couple of decades through better technologies and methodologies employed. As the usage of the networks continues to increase exponentially, high throughput of the networks has to be maintained with various performance-efficient network algorithms. IP address lookup is one of the processes, the performance of which dearly affects the overall network performance. Hashing has been widely used for fast IP address lookup due to its simplicity, but mostly assuming on hashing from an address set with uniformly distributed key values. Performance from these known hashing techniques is far from optimal due to the high nonuniformity in actual IP address distribution. In this paper, we propose a preprocessing method for the IP address databases to extract certain regularity to allow for design of more efficient hashing algorithms based on XOR operations. Simulation results show an improvement in performance ranging from 35% to 72% on randomly generated addresses and several sample IP address databases. The paper also shows that the proposed algorithms deliver comparable performance to other well-known hashing algorithms such as the CRC and RS hashing while requiring much less hardware to implement and a much shorter time to perform.

acm symposium on applied computing | 1997

An efficient processor partitioning and thread mapping strategy for mesh-connected multiprocessor systems

Hong Liu; Wei Ming Lin; Yongsheng Song

A b s t r a c t This paper considers the problem of processor partitioning and task mapping for large scale mesh networks. A simple adaptive partitioning and dynamic allocation strategy is proposed to provide constant closetoTe.qu.al resource allocation and, at the same time, to minimize communication distance within each partition. Several natural performance metrics are introduced to gauge fairness achieved in partitioning results in terms of resource allocation and expected communication cost. Compared with known techniques, results delivered by the proposed technique show mostly better performance readings and a much steadier performance spectrum when number of partitions changes. In addition, the technique is applicable to all sizes of mesh.

computer vision and pattern recognition | 1991

Parallel algorithms and architectures for discrete relaxation technique

Wei Ming Lin; V. K. Prasanna Kumar

Three parallel implementations based on three alternate sequential approaches are presented. The first design is a systolic array based on the known sequential method. An execution time of O(n/sup 2/m/sup 2/) is achieved with nm processing elements (PEs), with each PE composed of simple logic elements. The second design employs broadcast bus feature to speed up the execution of an alternate sequential method. Linear speedup is achieved by using nm processing elements. The sequential method has an execution time of O(n/sup 2/m/sup 2/) and the proposed parallel design runs in O(nm) time. The third design is a modified approach which is well suited for implementation on general-purpose machines. These designs achieve superior performance compared with the existing designs in terms of their simplicity, execution time, and domain of applications. Using the proposed designs, an efficient parallel implementation of stereo matching based on linear segments as primitives is derived.<<ETX>>

international conference on networks | 2006

Adaptive Hashing for IP Address Lookup in Computer Networks

Christopher J. Martinez; Wei Ming Lin

For applications that rely on large databases as the core data structure, the need for a fast search process is essential. Hashing algorithms have widely been adopted as the search algorithm of choice for fast lookups. Hashing algorithms involve the creation of hash values from the target database entries. A hashing algorithm that transforms the database to hash values with a distribution as uniform as possible would lead to a better search performance. When a database is already value-wise uniformly distributed, any regular hashing algorithm, such as bit-extraction, group-XOR, etc., will lead to a statistically perfect hashing result. In almost all known practical applications, the target database rarely demonstrates uniformly distributed characteristic. The use of any known regular hashing algorithm can lead to a performance far less than desirable. This paper aims at designing a hashing algorithm that can deliver a better performance for all practical databases. An analytical preprocess is performed on the original database to extract critical information that would significantly benefit the design of a better hashing algorithm. The process includes sorting database hash bits to provide a priority that would facilitate the decision-making on which bits and how these bits should be combined to generate better hash values. The algorithm follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution. The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin

world automation congress | 2014

Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions

Amin Sahba; Ramin Sahba; Wei Ming Lin

Simultaneous multithreading (SMT) provides a method to improve resource uti-lization and performance of superscalar CPUs by sharing key data-path components among multiple independent threads. As threads have unstable behavior, Effective use of critical resources among threads is a challenge to SMT. One of most critical shared resources in the pipeline is Issue Queue (IQ) so putting a limit on its occupation by each thread leads improving in the overall throughput; however, to accommodate the transient behavior of each thread, setting a limit (cap) should be done properly in real time in order to preclude under-utilization (thus, under-achieving) due to over-capping, or starvation for some threads due to under- capping. In this paper, a simple dynamic algorithm is proposed to adjust the cap value for each thread in real time according to the number of memory instructions of each thread. The simulation results show a considerable improvement in IPC over the regular no-capping technique and even a performance superior to the fixed capping approach by using the proposed method.

IEEE Transactions on Computers | 1990

A note on the linear transformation method for systolic array design

Wei Ming Lin; V.K.P. Kumar

The use of the linear transformation method to systolize the Warshall algorithm for computing the transitive closure of a graph on a mesh-connected array (without wraparound connections) is discussed. The technique is extended to design linear systolic arrays. The advantage of this approach is easy verification of correctness, as well as synthesis of a family of arrays with tradeoffs between I/O bandwidth, number of processing elements, and local storage. The technique can be further refined to cope with problems that entail nonconstant dependency vectors. >

international conference on information technology: new generations | 2014

A Real-Time Per-Thread IQ-Capping Technique for Simultaneous Multi-threading (SMT) Processors

Amin Sahba; Yilin Zhang; Marcus Hays; Wei Ming Lin

Effective use of critical resources among threads remains a challenge to Simultaneous multithreading (SMT) due to transient behaviors of threads. As Issue Queue (IQ) is regarded as one of most critical shared resources in the pipeline, putting a limit on its occupation by each thread may easily improve the overall throughput, however, such a limit (cap) should be set properly in real time to accommodate the transient behavior of each thread. We propose a simple dynamic algorithm to adjust the cap value for each thread in real time according to its activeness in terms of its dispatching and issuing activities. The simulation results show that the proposed technique not only achieves a significant improvement in IPC over the regular no-capping technique, but also demonstrates a performance superior to the fixed capping approach.

Explore More