Siew Kei Lam
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Siew Kei Lam.
international symposium on circuits and systems | 2003
H. Tian; Siew Kei Lam; Thambipillai Srikanthan
Otsus global automatic image thresholding method has been widely employed in various real-time applications. In this paper, a novel architecture for the BCVC (Between Class Variance Computation) of Otsus method is presented to meet these high-speed requirements. The proposed implementation employs a binary Logarithmic Conversion Unit (LCU) to eliminate the complex divisions and multiplications in the Otsus procedure. Implementations on the FPGA (Field Programmable Gate Array) platform show that our method achieves a computation speed-up of about 2.75 times by occupying only 1/6/sup th/ of the FPGA slices required by one that relies on the direct implementation.
Journal of Systems Architecture | 2009
Siew Kei Lam; Thambipillai Srikanthan
RISPs (Reconfigurable Instruction Set Processors) are increasingly becoming popular as they can be customized to meet design constraints. However, existing instruction set customization methodologies do not lend well for mapping custom instructions on to commercial FPGA architectures. In this paper, we propose a design exploration framework that provides for rapid identification of a reduced set of profitable custom instructions and their area costs on commercial architectures without the need for time consuming hardware synthesis process. A novel clustering strategy is used to estimate the utilization of the LUT (Look-Up Table) based FPGAs for the chosen custom instructions. Our investigations show that the area costs computations using the proposed hardware estimation technique on 20 custom instructions are shown to be within 8% of those obtained using hardware synthesis. A systematic approach has been adopted to select the most profitable custom instruction candidates. Our investigations show that this leads to notable reduction in the number of custom instructions with only marginal degradation in performance. Simulations based on domain-specific application sets from the MiBench and MediaBench benchmark suites show that on average, more than 25% area utilization efficiency (performance/area) can be achieved with the proposed technique.
IEEE Transactions on Industrial Electronics | 2009
Siew Kei Lam; Thambipillai Srikanthan; Christopher T. Clarke
Profitable custom instructions provide higher performance for a given reconfigurable area. Hence, choosing profitable custom instructions that are also area-time efficient is essential if design constraints must be met by field-programmable-gate-array (FPGA)-based reconfigurable processors. In this paper, we propose a framework for FPGA-based reconfigurable processors in order to rapidly identify a reduced set of profitable custom instructions without the need for actual hardware synthesis. The proposed framework is capable of estimating the area utilization and latencies of custom instructions on lookup-table-based commercial FPGAs. Simulations based on 15 applications from benchmark suites show that the proposed method provides, on average, an area reduction of over 29% for loss of mere 1.3% in compute performance. Our evaluations also confirm that the proposed framework is superior to an existing area-optimization approach that relies on exploiting the regularity of custom instruction data paths. In particular, an average area-time product gain of over 59% was achieved by deploying a reduced set of custom instructions obtained using the proposed framework.
IEEE Transactions on Computers | 2011
Siew Kei Lam; Thambipillai Srikanthan; Christopher T. Clarke
Area-time efficient custom instructions are desirable for maximizing the performance of reconfigurable processors. Existing data path merging techniques based on resource sharing can be deployed to improve area efficiency of custom instructions. However, these techniques lead to large increase in the critical path delay. In this paper, we propose a novel strategy that takes into account the architectural constraints of the FPGA device in order to realize custom instructions with low-area delay product. The proposed strategy is based on partitioning the custom instruction data paths into a set of basic clusters such that they can be combined using a heuristic-based cluster merging process to maximize the utilization of FPGA logic blocks. Unlike the resource sharing method, the proposed cluster merging process does not maximize sharing of common resources and this leads to lesser reliance on multiplexers for implementing custom instructions. Resource sharing is only applied sparingly at the final stage to increase utilization of logic blocks. We show that the proposed technique leads to more than 34 percent, 34 percent, and 42 percent average reduction in area costs for Spartan-3, Virtex-4, and Virtex-5 architectures, respectively, when compared to optimizations achieved through commercial synthesis tool. We have also shown that the proposed technique leads to more than 18 percent, 17 percent, and 13 percent average reduction in area costs for Spartan-3, Virtex-4, and Virtex-5, respectively, when compared to results obtained using one of the most efficient resource sharing-based method reported in the literature. In addition, the proposed technique outperforms the resource sharing-based method in terms of area-delay product, with average reductions of more than 27 percent, 34 percent, and 19 percent for Spartan-3, Virtex-4, and Virtex-5, respectively.
IEEE Transactions on Intelligent Transportation Systems | 2015
Meiqing Wu; Siew Kei Lam; Thambipillai Srikanthan
It has been well recognized that detecting road surface in a realistic environment is a challenging problem that is also computationally intensive. Existing road surface detection methods attempt to fit the road surface into rigid models (e.g., planar, clothoid, or B-Spline), thereby restricting to road surfaces that match specific models. In addition, the curve-fitting strategies employed in such techniques incur high computational complexity, making them unsuitable for in-vehicle deployments. In this paper, we propose an efficient nonparametric road surface detection algorithm that exploits the depth cue. The proposed method relies on four intrinsic road scene attributes observed under stereo geometry and has been shown to reliably detect both planar and nonplanar road surfaces efficiently. Extensive evaluations are performed on three widely used benchmarks (i.e., enpeda, KITTI, and Daimler), encompassing many complex road scenarios. The experimental results show that the proposed algorithm significantly outperforms the well-known techniques both in terms of detection accuracy and runtime performance.
southwest symposium on image analysis and interpretation | 2002
H. Tian; Thambipillai Srikanthan; K.V. Asari; Siew Kei Lam
Videoendoscopy is becoming increasingly popular in surgical procedures. Wide-angle lenses are commonly employed in such applications for enhanced viewing capability. However, images captured with these lenses suffer from barrel distortion. 2D distortion correction for images captured with wide-angle lenses has been widely investigated in clinical applications, it is necessary to incorporate correction techniques that are independent of the distance from the object to the camera lens. In this paper, we prove that for a wide-angle camera lens with fixed focal length, the distortion correction coefficients remain the same for distances within the minimum and maximum range (depth of field). Experiments have also been performed to verify this.
Journal of Systems Architecture | 2010
Tao Li; Wu Jigang; Siew Kei Lam; Thambipillai Srikanthan; Xicheng Lu
Custom-instruction selection is an essential phase in instruction set extension for reconfigurable processors. It determines the most profitable custom-instruction candidates for implementing in the reconfigurable fabric of a reconfigurable processor. In this paper, a practical computing model is proposed for the custom-instruction selection problem that takes into account the area constraint of the reconfigurable fabric. Based on the new computing model, two heuristic algorithms and an exact algorithm are proposed. The first heuristic algorithm, denoted as HEA, dynamically assigns priorities to the custom instruction candidates and incorporates efficient strategies to select custom instructions with the highest priority. The second heuristic algorithm, denoted as TSA, employs an efficient tabu search algorithm to refine the results of HEA to near-optimal ones. Also, a branch-and-bound algorithm (BnB) is proposed to produce exact solutions for relatively small-sized problems or problems with stringent area-constraints. Experimental results show that HEA can produce more specific approximate solutions with a difference of only about 3% when compared to the optimal solutions produced by BnB. This difference is further reduced to about 0.6% by TSA. In addition, for large-sized problems where the exact algorithm becomes prohibitive, HEA and TSA can still produce solutions within reasonable time.
field-programmable technology | 2006
Siew Kei Lam; Bharathi N. Krishnan; Thambipillai Srikanthan
The instruction set extension capability of RISPs (reconfigurable instruction set processors) provides an attractive means to meet the flexibility, performance, and cost demands of ubiquitous computing devices. Run-time reconfiguration can further increase the cost efficiency and hardware specialization of these processors by dynamically changing the configuration of the reconfigurable logic to the required functionality. In this paper, we propose the use of a heuristic that leads to the selection of large custom instructions for increased performance gain. Result analysis of six applications from the MiBench embedded benchmark suite show that efficient data-path merging can be applied to the custom instructions to reduce the average number of configurations to less than 8 in a run-time RISP. In addition, there is only a small difference in the average number of configurations when compared to a custom instruction selection strategy that results in lower performance
IEEE Transactions on Computers | 2004
Thambipillai Srikanthan; Siew Kei Lam; Mishra Suman
Computer arithmetic operations based on the BSD (binary signed-digit) number representation system lend themselves well to high-speed computations due to the facilitation of limited carry addition/subtraction. We propose an area-time efficient method for sign detection in a BSD number system based on optimized reverse tree structure. When compared to other popular approaches, such as the most significant carry detection-based CLA (carry look-ahead) and MRC (multilevel reverse carry) implementations, the proposed method is superior to both area and time costs in VLSI. Synthesis results for different word lengths show that the proposed approach to sign detection in the BSD number system continues to maintain its advantage over area and time measures.
international symposium on circuits and systems | 2012
Meiqing Wu; Nirmala Ramakrishnan; Siew Kei Lam; Thambipillai Srikanthan
In this paper, we present a novel and computationally efficient pruning technique to speed up the Shi-Tomasi and Harris corner detectors. The proposed technique quickly prunes non-corners and selects a small corner candidate set by approximating the complex corner measure of Shi-Tomasi and Harris. The actual corner measure is then applied only to the reduced candidate set. Experimental results on the NiOS-II platform show that the proposed technique achieves an average execution time savings of 90% for Shi-Tomasi and 70% for Harris detectors for 500 corners with no loss in accuracy.